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BACKGROUND OF TOE INVENTION ' - T " 

1. Field of the I nvention: 

The present invention relates generally to a 
5 simplified, reduced instruction set computer (RISC) 
microprocessor . More particularly, it relates to such a 
microprocessor which is capable of performance levels of, 
for example, 20 million instructions per second (MIPS) at 
a price of, for example, 20 dollars. 

10 2. Description of the Prior Art : 

Since the invention of the microprocessor, 
improvements in its design have taken two different 
approaches. In the first approach, a brute force gain in 
performance has been achieved through the provision of 

15 greater numbers of faster transistors in the 
microprocessor integrated circuit and an instruction set 
of increased complexity. This approach is exemplified by 
the Motorola 68000 and Intel 80X86 microprocessor 
families. The trend, in this approach is to larger die 

20 sizes and packages, with hundreds of pinouts. 

More recently, it has been perceived that performance 
gains can be achieved through comparative simplicity, both 
in the microprocessor integrated circuit itself and in its 
instruction set. This second approach provides RISC 

25 microprocessors, and is exemplified by the Sun SPARC and 
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the Intel 89« microprocessors. However, even with this -> 
approach as conventionally practiced, the packages for the 
microprocessor are large, in order to accommodate the - 
large number of pinouts that continue to be employed . A 
need therefore remains for further simplification of high 
performance microprocessors. 

With conventional high performance microprocessors, 
fast static memories are required for direct connection to^r^ 
the microprocessors in order to allow memory accesses that 
are fast enough to keep up with the microprocessors. ^ 
Slower dynamic random access memories (DRAMs) 
with such microprocessors only in a hierarchical memory 
arrangement, with the static memories acting as a buffer _ 
between the microprocessors and the DRAMs. The necessity 
to use static memories increases cost of the resulting 
systems . 

Conventional microprocessors provide direct memory 
accesses (DMA) for system peripheral units through DMA 
controllers, which may be located on the microprocessor 
integrated circuit, or provided separately. Such DMA 
controllers can provide routine handling of DMA requests 
and responses, but some processing by the main central 
processing unit (CPU) of the microprocessor is required. 

SUMMARY OF THE INVENTION 

Accordingly, it is an object of this invention to 
provide a microprocessor with a reduced pin count and cost 
compared to conventional microprocessors. 

It is another object of the invention to provide a 
high performance microprocessor that can be directly 
connected to DRAMs 'without sacrificing microprocessor 
speed. 

It is a further object of the invention to provide a 
high performance microprocessor in which DMA does not 
require use of the main CPU during DMA requests and 
responses and which provides very rapid DMA response with 
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predictable response times. 

The attainment of these and related objects siay'be 
achieved through use -of .the novel high performance , vlow 
cost microprocessor herein disclosed. In accordance with 
5 one aspect of the invention, a microprocessor system in 
accordance with this invention has 

unit, a dynamic random access memory and .a bus connecting^ 
the central processing unit to the dynamic random access 
memory. There is a multiplexing means on the bus between., 
10 the central processing unit and the dynamic random access 
memory. The multiplexing means is connected and 
configured to provide row addresses, column addresses and 
data on the bus. _ 

' In accordance with another aspect of the invention, 
the microprocessor system has a means connected to the 
bus for fetching instructions for the central processing 
unit on the bus. The means for fetching instructions is 
configured" to fetch multiple sequential instructions in a 
single memory cycle. In a variation of this aspect of 
20 the invention, a programmable read only memory containing 
instructions for the central processing unit is connected 
to the bus. The means for fetching instructions includes 
means for assembling a plurality of instructions from the 
programmable read only memory and storing the plurality of 
25 instructions in the dynamic random access memory. 

In another aspect of the invention, the 
microprocessor system includes a central processing unit, 
a direct memory access processing unit and a memory 
connected by a bus. The direct memory access processing 
30 unit includes means for fetching instructions for the 
central processing unit and for fetching instructions for 
the direct memory access processing unit on the bus. 

In a further aspect of the invention, the 
microprocessor system, including the memory, is contained 
35 in an integrated circuit. The memory is a dynamic random 
access memory, and the means for fetching multiple 
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instructions includes a column latch for receiving the 
multiple instructions. 

In still another aspect of the invention, the 
microprocessor system additionally includes an ; ~ 
5 instruction register for the multiple instructions 

connected to the means for fetching instructions. A means .0^^ 
is connected to the instruction" register for supply ing"*^">^ 
the multiple instructions in succession from the v:--; 
instruction register. A counter is connected to control -if".;;.; 

10 the means for supplying the multiple instructions "to 
supply the multiple instructions in succession. A means 
for decoding the multiple instructions is connected to 
receive the multiple instructions in succession from the 
means for supplying the multiple instructions. The 

15 counter is connected to said means for decoding to 
receive incrementing and reset control signals from the 
means for decoding. The means for decoding is configured 
to supply the reset control signal to the counter and to-""-.-'-" 
supply a control signal to the means for fetching 

20 instructions in response to a SKIP instruction in the 
multiple instructions. In a modification of this aspect 
of the invention, the microprocessor system additionally 
has a loop counter connected to receive a decrement 
control signal from the means for decoding. The means 

25 for decoding is configured to supply the reset control 
signal to the counter and the decrement control signal to 
the loop counter in response to a MICROLOOP instruction in 
the multiple instructions. In a further modification to 
this aspect of the invention, the means for decoding is 

30 configured to control the counter in response to an 
instruction utilizing a variable width operand. A means 
is connected to the counter to select the variable width 
operand in response to the counter. 

In a still further aspect of the invention, the 

35 microprocessor system includes an arithmetic logic unit. 
A first push down stack is connected to the arithmetic 
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logic unit* The first push down stack includes means for 
storing a top item connected to a first input of the ^ _ ^ -^.^^ 
arithmetic logic unit and means for storing a next item '^>jt>^, ri\j$ 
connected to a second input of the arithmetic logic unit. 
The arithmetic logic unit has an output connected to the 
means for storing a top item. -The means for storing a vtop)^^§5^^|^^^ 
item is connected to provide an input to a register file. T ; - '' : 5&x j 
The register file desirably is a second push down stack, ^& 



and the means ^for ^storing a top \ item and the register 



file are bidirectionally connected. 

In another aspect of the invention / a data 'process l^f^fe^^t^^^^e: 




system has a microprocessor including a sensing circuit 
and a driver circuit, a memory, and an output enable line 
connected between the memory, the sensing circuit and the 
driver circuit. The sensing circuit is configured to 
provide a ready signal -hen the output enable line 
reaches a predetermined electrical level, such as a 
voltage. The microprocessor is configured so that the 
driver c' cuit provides an enabling signal on the output 
enable line responsive to the ready signal. 

In a further aspect of the invention, the 
microprocessor system has a ring counter variable speed 
system clock connected to the central processing unit. 
The central processing unit and the ring counter variable 
speed system clock are provided in a single integrated 
circuit. An input/ output interface is connected to 
exchange coupling control signals, addresses and data with 
the input/output interface. A second clock independent 
of the ring counter variable speed system clock is 
connected to the input/ output interface. 

In yet another aspect of the invention, a push down 
stack is connected to the arithmetic logic unit. The 
push down stack includes means for storing a top item 
connected to a first input of the arithmetic logic unit 
and means for storing a next item connected to a second 
input of the arithmetic logic unit. The arithmetic logic 
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unit has an output connected to the means for storing a ~ " 

top item. The push down stack has a first plurality of i^^^fl^^: 
stack elements configured as latches and a" secondl 
plurality of stack elements configured as a random access 
memory. The first and second plurality of stack elements 
and the central processing unit are provided in a single " 
integrated circuit. A third plurality of stack elements 
is configured as a random access memory external to the ?m 




single integrated circuit. 



In this ; aspect of 



invention, desirably a first pointer is connected to the^^^S^^ 



first plurality of stack elements, a second pointer ? ^ 
connected to the second plurality of stack elements, and 
a third pointer is connected to the third plurality of 
stack elements. The central processing unit is connected 
to pop items from the first plurality of stack elements. 
The first stack pointer is connected to the second stack 
pointer to pop a first plurality of items from the second 
plurality of stack elements when the first plurality of 
stack elements are empty from successive pop operations by 
the central processing unit. The second stack pointer is 
connected to the third stack pointer to pop a second 
plurality of items from the third plurality of stack 
elements when the second plurality of stack elements are 
empty from successive pop operations by the central 
processing unit. 

In another aspect of the invention, a first register 
is connected to supply a first input to the arithmetic 
logic unit. A first shifter is connected between an 
output of the arithmetic logic unit and the first 
register. A second register is connected to receive a 
starting polynomial value. An output of the second 
register is connected to a second shifter. A least 
significant bit of the second register is connected to 
The arithmetic logic unit. A third register is connected 
to supply feedback terms of a polynomial to the arithmetic 
logic unit. A down counter, for counting down a number 
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corresponding to digits of a polynomial to be generated, 
is connected to the arithmetic logic unit. The arithmetic 
logic unit is responsive to a polynomial instruction to 
carry out an exclusive OR of the contents of the first 
register with the contents of the third register if the 
least significant bit of the second register is a "ONE" 
and to pass the contents of the first register unaltered 
if the least significant bit of the second register is a 
"ZERO", until the down counter completes a count. The 
polynomial to be generated results in said first register. 

In still another aspect of the invention, a result 
register is connected to supply a first input to the 
arithmetic logic unit. A first, left shifting shifter is 
connected between an output of the arithmetic logic unit 
and the result register. A multiplier register is 
connected to receive a multiplier in bit reversed form. 
An output of the multiplier register is connected to a 
second, right shifting shifter. A least significant bit 
of the multiplier register is connected to the arithmetic 
logic unit. A third register is connected to supply a 
multiplicand to said arithmetic logic unit. A down 
counter, for counting down a number corresponding to one 
less than the number of digits of the multiplier, is 
connected to the arithmetic logic unit. The arithmetic 
logic unit is responsive to a multiply instruction to add 
the contents of the result register with the contents of 
the third register, when the least significant bit of the 
multiplier register is a "ONE" and to pass the contents of 
the result register unaltered, until the down counter 
completes a count. The product results in the result 
register. 

The attainment of the foregoing and related objects, 
advantages and features of the invention should be more 
readily apparent to those skilled in the art, after review 
of the following more detailed description of the 
invention, taken together with the drawings, in which: 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is an external, plan view of an integrated 
circuit package incorporating a microprocessor in 
5 accordance with the invention. 

Figure 2 is a block diagram of a microprocessor in 
accordance with the invention. 
J Figure 3 is a block diagram of a portion of a data 

processing system incorporating the microprocessor Tbf 
10 Figures 1 and 2. ^ ^ " 

Figure 4 is a more detailed block diagram of a 
portion of the microprocessor shown in Figure 2. 

Figure 5 is a more detailed block diagram of another 
portion of the microprocessor shown in Figure 2. 
15 F;gure 6 is a block diagram of another portion of the 

data processing system shown in part in Figure 3 and 
incorporating the microprocessor of Figures 1*2 and 4-5. 

Figures 7 and 8 are layout diagrams for the data 
processing system shown in part in Figures 3 and 6. 
20 Figure 9 is a layout diagram of a second embodiment 

of a microprocessor in accordance with the invention in a 
data processing system on a single integrated circuit. 

Figure 10 is a more detailed block diagram of a 
portion of the data processing system of Figures 7 and 8. 
25 Figure II is a timing diagram useful for 

understanding operation of the system portion shown in 
Figure 12. 

Figure 12 is another more detailed block diagram of a 
further portion of the data processing system of Figures 7 
3 0 and 8. 

Figure 13 is a more detailed block diagram of a 
portion of the microprocessor shown in Figure 2. 

Figure 14 is a more detailed block and schematic 
diagram of a portion of the system shown in Figures 3 and 
35 7-8. 

Figure 15 is a graph useful for understanding 
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operation of the system portion shown Si Figure 14 . ^ 

Figure 16 is a more detailed block diagram showing 
part of the system portion shown in Figure 4.\ [~ : 

Figure 17 . is a more detailed block diagram of a. -// 
portion of the microprocessor shown in Figure 2. 

Figure 18 is a more detailed block diagram of part of 
the microprocessor portion shown in Figure 17. 

Figure 19 is a set of waveform diagrams useful Jf or 
understanding operation of the part of the microprocessor^ ; .* 
portion shown in Figure 18 ♦ " 

Figure 20 is a more detailed block diagram showing 
another part of the system portion shown in Figure 4.. 

Figure 21 is a more detailed block diagram showing 
another part of the system portion shown in Figure 4. ;. ; ..;f 

Figures 22 and 23 are more detailed block diagrams 
showing another part of the system portion shown in Figure 
4. 



DETAILED DESCRIPTION OF THE INVENTION 
20 OVERVIEW 

The microprocessor of this invention is desirably 
implemented as a 32-bit microprocessor optimized for: - 
HIGH EXECUTION SPEED, and 
LOW SYSTEM COST, 

25 In "this embodiment, the microprocessor can be thought of 
as 20 MIPS for 20 dollars. Important distinguishing 
features of the microprocessor are: 

Uses low-cost commodity DYNAMIC RAMS to run 20 MIPS 
4 instruction fetch per memory cycle 
30 On-chip fast page-mode memory management 

Runs fast without external cache 
Requires few interfacing chips 
Crams 32-bit CPU in 44 pin SOJ package 
The instruction set is organized so that most 
35 operations can be specified with 8-bit instructions. Two 
positive products of this philosophy are: 
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,^^_ J ; , - Programs are smaller, 

Programs can execute much faster* 

The bottleneck in most computer systems is the memory 
bus. The bus is used to fetch instructions and fetch" and 
5 store data. The ability to fetch four instructions in' a 
single memory bus cycle significantly increases the bus 
availability to handle data. 
;]C~ ;: Turning now to the drawings, more particularly to 

Figure 1, there is shown a packaged j32-bit microprocessor 

10 50 in a 4 4 -pin plastic leadless chip carrier, shown 
approximately 100 times its actual size of about 0.8 inch 
on a side. The fact that the microprocessor 50 is 
provided as a 44-pin package represents a substantial 
departure from typical microprocessor packages, which 

15 usually have about 200 input/output (I/O) pins. The 
microprocessor 50 is rated at 20 million instructions per 
second (MIPS). Address and data lines 52, also labelled 
D0-D31, are shared for addresses and data without speed 
penalty as a result of the manner in which the 

20 microprocessor 50 operates, as will be explained below. 
DYNAMIC RAM 

In addition to the low cost 44-pin package, another 
unusual aspect of the high performance microprocessor 50 
is that it operates directly with dynamic random access 

25 memories (DRAMs) , as ^hown by row address strobe (HAS) and 
column address strobe (CAS) I/O pins 54. The other I/O 
pins for the microprocessor 50 include V DD pins 56, V gs 
pins 58, output enable pin 60, write pin 62, clock pin 64 
and reset pin 66. 

30 All high speed computers require high speed and 

expensive memory to keep up. The highest speed static RAM 
memories cost as much as ten times as much as slower 
dynamic RAMs. This microprocessor has been optimized to 
use low-cost dynamic RAM in high-speed page-mode. 

35 Page-mode dynamic RAMs offer static RAM performance 
without the cost penalty. For example, low-cost 85 '.sec. 
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dynamic RAMs access at 25 nsec when operated in fast 
page-mode • Integrated fast page-mode control on the W: * * ' 
, microprocessor chip simplifies ^ system , interfacings and^%3%^ 
results in a faster system.* _ ^ , ^ -;v^^i^^^^fi^ 
5 Details of the microprocessor 50 are shown in Figure 

2. The microprocessor 50 includes a main central 
processing unit (CPU) 70 and a separate direct memory 
access (DMA) CPU 72 in a single integrated circuit making 
up the microprocessor 50. w The main CPU 70 has a first >16 

10 deep push down stack 74, which has a top item register 76 '"5^: 
and a next item register 78, respectively connected to 4 
^ provide inputs to an arithmetic logic unit (ALU) , 80 by 
lines 82 and 84. An output of the AUJ 80 is connected to 
the top item register 76 by line 86. The output of the 

15 top item register at 82 is also connected by line 88 to an 
internal data bus 90. 

A loop counter 92 is connected to a decrementer 94 by 
lines 96 and 98. The loop counter 92 is bidirectionally 
connected to the internal data bus 90 by line 100. Stack 

20 pointer 102 , return stack pointer 104, mode register 106 
and instruction register 108 -are also connected to the 
internal data bus 90 by lines 110, 112, 114 and 116, 
respectively. The internal data bus 90 is connected to 
memory controller 118 and to gate 120. The gate 120 

25 provides inputs on lines 122, 124, and 126 to X register 
128, program counter 130 and Y register 132 of return 
push down stack 134. The X register 128, program counter 
130 and Y register 132 provide outputs to internal address 
bus 136 on lines 138, 140 and 142. The internal address 

30 bus provides inputs to the memory controller 118 and to an 
incrementer 144. The incrementer 144 provides inputs to 
the X register, program counter and Y register via lines 
146, 122, 124 and 126. The DMA CPU 72 provides inputs to 
the memory controller 118 on line 148. The memory 

35 controller 118 is connected to a RAM (not shown) by 
address/data bus 150 and control lines 152- 
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delays. This is a function of the small number of gates ^> 
- in the microprocessor 50 and ; the high degree - of 5 

parallelism in the architecture of the microprocessor. \ v^^^^W 
Figure 3 shows* how column and row addresses ^ are v ;-5 
5 multiplexed on lines D8-D14 of the microprocessor 50 for 
addressing DRAM 150 from I/O pins 52. The DRAM 150 is one 
of eight, but only one DRAM 150 has been shown for 
clarity. As shown, the lines D11-D18 are respectively : ^ 
connected to row address inputs A0-A8 of the DRAM 150?^^!^? 

10 Additionally, lines D12-D15 are connected to the ~ data%";^;>.:^^ 
inputs DQ1-DQ4 of the DRAM 150. The output enable, write - ^ 
and column address strobe pins 54 are respectively w 
connected to the output enable, write and column address/ 
strobe inputs of the DRAM 150 by lines 15£. The row 

15 address strobe pin 54 is connected through row address 
strobe decode logic 154 to the row address strobe input of 
the DRAM 150 by lines 156 and 158. 

D0-D7 pins 52 (Figure 1) are idle when the 
microprocessor 50 is outputting multiplexed row and column 

20 addresses on D11-D18 pins 52. The D0-D7 pins 52 can 
therefore simultaneously be used for I/O when right 
justified 1/0 is desired. Simultaneous addressing and I/O 
can therefore be carried out. 

Figure 4 shows how the microprocessor 50 is able to 

25 achieve performance equal to the use of static RAMS with 
DRAMs through multiple instruction fetch in a single clock 
cycle and instruction fetch-ahead. Instruction register 
108 receives four 8-bit byte instruction words 1-4 on 32- 
bit internal data bus 90. The four instruction byte 1-4 

30 locations of the instruction register 108 are connected to 
multiplexer 170 by busses 172, 174, 176 and 178, 
respectively. A microprogram counter 180 is connected to 
the multiplexer 170 by lines 182. The multiplexer 170 is 
connected to decoder 184 by bus 186. The decoder 184 
35 provides internal signals to the rest of the 
microprocessor 50 on lines 188. 
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Most significant bits 190 of each instruction byte 
4 location are connected to a 4rinput decoder 192 toy " 
lines 194. The output of decoder 192 is connected to 
memory controller 118 by line 196. Program counter 130 is 
connected to memory controller 118 by internal address bus 
136, and the instruction register 108 is connected to the 
memory controller 118 by the internal data bus 90. 
Address/data bus 198 and control bus ,200 are connected to*^ 
the DRAMS 150 (Figure 3). /_ .rV ^ - < l 

In operation, when the most significant bits 190 of :* 
remaining instructions 1-4 are M 1 H in a clock cycle of the 
microprocessor 50, there are no memory reference 
instructions in the queue. The output of decoder 192 on' 
line 196 requests an instruction fetch ahead by memory 
controller 118 without interference with other accesses. 
While the current instructions in instruction register 108 
are executing, the memory controller 118 obtains the - 
address of the next set of four instructions from program 
counter 130 and obtains that set of instructions. By the 
time the current set of instructions has completed 
execution, the next set of instructions is ready for 
loading into the instruction register. 

Details of the DMA CPU 72 are provided in Figure 5. 
Internal data bus 90 is connected to memory controller 118 
and to DMA instruction register 210. The DMA instruction 
register 210 is connected to DMA program counter 212 by 
bus 214, to transfer size counter 216 by bus 218 and to 
timed transfer interval counter 220 by bus 222. The DMA 
instruction register 210 is also connected to DMA I/O and 
RAM address register 224 by line 226. The DMA I/O and RAM* 
address register 224 is connected to the memory controller 
118 by memory cycle request line 228 and bus 230. The DMA 
program counter 212 is connected to the internal address 
bus 136 by bus 232. The transfer size counter 216 is 
connected to a DMA instruction done decrementer 234 by 
lines 236 and 238. The decrementer 234 receives a control 
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input on memory cycle acknowledge line 240 



When transfer j 

size counter 216 has completed its count, it provides aT ^ 
control signal to DMA program counter 212 on Ixne 242,; 
Timed transfer interval counter 220 is connected to 
5 decrementer 244 by lines 246 and 248. The decrementer 244 

receives a control input from a microprocessor system ^ 
clock on line 250. 

The DMA CPU 72 controls itself and has the ability to 
fetch and execute instructions. It operates as a co- 
10 processor to the main CPU 70 (Figure 2) for time specific 

processing. \--z?-fipy; 

Figure 6 shows how the microprocessor 50 is connected 
to an electrically programmable read only memory (EPROM) 
260 by reconfiguring the data lines 52 so that some of the 
15 data lines 52 are input lines and some of them are output 
lines. Data lines 52 D0-D7 provide data to and from 
corresponding data terminals 262 of the EPROM 260. Data - 
lines 52 D9-D18 provide addresses to address terminals 264 
of the EPROM 260. Data lines 52 D19-D31 provide inputs 
20 from the microprocessor 50 to memory and I/O decode logic 
266. RAS 0/1 control line 268 provides a control signal 
for determining whether the memory and I/O decode logic 
provides a DRAM RAS output on line 270 or a column enable 
output for the EPROM 260 on line 272. Column address 
25 strobe terminal 60 of the microprocessor 50 provides an 
output enable signal on line 274 to the corresponding 
terminal 276 of the EPROM 260. 

Figures 7 and 8 show the front and back of a one card 
data processing system 280 incorporating the 
30 microprocessor 50, MSM514258-10 type DRAMs 150 totalling 2 ■ 
megabytes, a Motorola 50 MegaHertz crystal oscillator 
clock 282, I/O circuits 284 and a 27256 type EPROM 260. 
The I/O circuits 284 include a 74HC04 type high speed hex 
inverter circuit 286, an IDT39C828 type 10-bit inverting 
35 buffer circuit 288, an IDT39C822 type 10-bit inverting 
register circuit 290, and two IDT39C823 type 9-bit non- 
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inverting register circuits 292. - The card 
completed with a KAX12V type DC-DC converter circuit 294, 
34-pin dual AMP type headers 296, a coaxial female power 
connector 298, and a 3-pin AH? right angle header .300. 
The card 280 is a low cost, imbeddable product that can be 
incorporated in larger systems or used as an internal 
development tool. 

The microprocessor 50 is a very high performance (50 
MHz) RISC influenced 32-bit -CPU designed to work closely 
with dynamic RAM. Clock for clock, the microprocessor 50 
approaches the theoretical performance limits possible 
with a single CPU configuration. Eventually, the 
microprocessor 50 and any other processor is limited by 
the bus bandwidth and the number of bus paths. The 
critical conduit is between the CPU and memory. 

One solution to the bus bandwidth/bus path problem is 
to integrate a CPU directly onto the memory chips, giving 
every memory a direct bus the CPU. Figure 9 shows another 
microprocessor 310 that is provided integrally with 1 
megabit of DRAM 311 in a" single integrated circuit 312. 
Until the present invention, this solution has not been 
practical, because most high performance CPUs require from 
500,000 to 1,000,000 transistors and enormous die sizes 
just by themselves. The microprocessor 310 is equivalent 
to the microprocessor 50 in Figures 1-8. The 
microprocessors 50 and 310 are the most transistor 
efficient high performance CPUs in existence, requiring 
fewer than 50,000 transistors for dual processors 70 and 
72 (Figure 2) or 314 and 316 (less memory). The very high 
speed of the microprocessors 50 and 310 is to a certain 
extent a function of the small number of active devices. 
In essence, the less silicon gets in the way, the faster 
the electrons can get where they are going. 

The microprocessor 310 is therefore the only CPU 
suitable for integration on the memory chip die 312. Some 
simple modifications to the basic microprocessor 50 to 
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take advantage of the proximity to the DRAM array '311^can3J3TT : 
also increase the microprocessor 50 clock speed by 50 
percent, and probably more • 

The microprocessor 310 core on board the DRAM die 312 
provides most of the speed and functionality required for 
a large group ... of ^applications _ from automotive to ;^ 
peripheral control. However, the integrated CPU 310/DRAM 





The CPU 310/DRAM 311 -^^^^^ 



311 concept has the potential to redefine significantly ij; ^ 
the way multiprocessor solutions can solve a spectrum of ff5^¥i^f %^ 
very compute intensive problems. 

combination eliminates the Von Neumann bottleneck by 
distributing it across numerous CPU/DRAM chips 312. The 
microprocessor 310 is a particularly good core for 
multiprocessing, since it was designed with the SDI 
targeting array in mind, and provisions were made for 
efficient interprocessor communications. 

Traditional multiprocessor implementations have been 
very expensive in addition to being unable to exploit 
fully the available CPU horsepower. .~ Multiprocessor 
systems have typically been built up from numerous board 
level or box level computers. The result is usually an 
immense amount of hardware with corresponding wiring, 
power consumption and communications problems. By the 
time the systems are interconnected, as much as 50 percent 
of the bus speed has been utilized just getting through 
the interfaces. 

In addition, multiprocessor system software has been 
scarce. A multiprocessor system can easily be crippled by 
an inadequate load-sharing algorithm in the system 
software, which allows one CPU to do a great deal of work 
and the others to be idle. Great strides have been made 
recently in systems software, and even UNIX V.4 may be 
enhanced to support multiprocessing. Several commercial 
products from such manufacturers as DUAL Systems and 
UNISOFT do a credible job on 68030 type microprocessor 
systems now. 
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The microprocessor 310 architecture eliminates most 
of the interface friction, since up to 64 CPU 310/RAM 311 ^'^^^C'^^m 
processors should be able to intercommunicate without 
buffers or latches* Each chip 312 has about 40 MIPS raw 



5 speed, because placing the DRAM 311 next to the CPU 310^^;^^:^ 
allows the microprocessor 310 instruction cycle to be cut^^fe^V-Tr: 
in half, compared to the microprocessor 50* A 64 chip, 
array of these chips 312 is more powerful than any other 
existing computer* Such an array fits on a 3 X 5 card,^ if^r ^ 

10 cost less than a FAX machine, and draw about the same^;$&?- 
power as a small television. 

Dramatic changes in price/performance always reshape 
existing applications and almost always create new ones. 
The introduction of microprocessors in the mid 1970s 

15 created video games, personal computers, automotive 
computers, electronically controlled appliances, and low 
cost computer peripherals. 

The integrated circuit 312 will find applications in 
all of the above areas, plus create some new ones. A, 

2 0 common generic parallel processing algorithm handles 
convolution/Fast Fourier Transform (FFT) /pattern 
recognition. Interesting product possibilities using the 
integrated circuit 312 include high speed reading 
machines, real-time speech recognition, spoken language 

25 translation, real-time robot vision, a product to identify 
people by their faces, and an automotive or aviation 
collision avoidance system. 

A real time processor for enhancing high density 
television (HDTV) images, or compressing the HDTV 

30 information into a smaller bandwidth, would be very 
feasible. The load sharing in HDTV could be very 
straightforward. Splitting up the task according to color 
and frame would require 6, 9 or 12 processors. Practical 
implementation might require 4 meg RAMs integrated with 

35 the microprocessor 310. 

The microprocessor 310 has the following 



A-50412/WEH 




a 



X, 



»5 specifications: * \ ^ 

CONTROL LINES , >?J? 

4 - POWER/GROUND ^ 
1 - CLOCK * 
5 32 - DATA I/O r * ^ 

4 - SYSTEM CONTROL *^ 
EXTERNAL MEMORY FETCH 

EXTERNAL MEMORY FETCH AUTO INCREMENT X 
EXTERNAL MEMORY FETCH AUTOINCREMENT Y 
10 EXTERNAL MEMORY WRITE 

EXTERNAL MEMORY WRITE AUTOINCREMENT X 
EXTERNAL MEMORY WRITE AUTOINCREMENT Y 
EXTERNAL PROM FETCH 
LOAD ALL X REGISTERS 
15 LOAD ALL Y REGISTERS 

LOAD ALL PC REGISTERS 
EXCHANGE X AND Y 
INSTRUCTION FETCH 
ADD TO PC 
20 ADD TO X 

WRITE MAPPING REGISTER 
READ MAPPING REGISTER 
REGISTER CONFIGURATION - : ~ 
MICROPROCESSOR 310 CPU 316 CORE - r 
25 COLUMN LATCH 1 (1024 BITS) 32 X 32 MUX r .„ • 

STACK POINTER (16 BITS) ' ^ 

COLUMN LATCH 2 (1024 BITS) 32 X_32 MUX ; V 

RSTACK POINTER (16 BITS) 
PROGRAM COUNTER 32 BITS 
30 XO REGISTER 32 BITS (ACTIVATED ONLY FOR ON-CHIP ACCESSES) - ^"V 
YO REGISTER 32 BITS (ACTIVATED ONLY FOR ON-CHIP ACCESSES) rV> 
LOOP COUNTER 32 BITS ■ < 5 

DMA CPU 314 CORE 
DMA PROGRAM COUNTER 24 BITS 
35 INSTRUCTION REGISTER 32 BITS 

I/O & RAM ADDRESS REGISTER 32 BITS 
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TRANSFER SIZE COUNTER 12 BITS 
INTERVAL COUNTER 12 BITS 

To offer memory expansion for the basic chip 
w . . intelligent DRAM can . be produced* This _chip will _be ^>4,^ 2 , 

> ^ 5 optimized for high speed operation with the integrated ^- 1 .a 

/C : > circuit 312 by having three on-chip address registers: 

Program Counter, X Register and Y register • As a result, ^ ^--^^ 
fe? H5§ , to access the intelligent DRAM, no address is required, 

and a total access cycle could be as short as 10 nsec. 
^ . io Each expansion ORAM would maintain its own copy of the 

three registers and would be identified by a code 
£mmh specifying its memory address. Incrementing and adding to 

the three registers will actually take place on the memory / ~ 
chips. A maximum of 64 intelligent DRAM peripherals would 
; ' -i 15 allow a large system to be created without sacrificing ^^S^ 

^ - - speed by introducing multiplexers or buffers. 

There are certain differences between the — — r~ 
microprocessor 310 ant? the microprocessor 50 that arise % 
from providing the microprocessor 310 on the same die 312 
2 0 with the DRAM .11. Integrating the DRAM 311 allows fe?£>? . 

architectural changes in the microprocessor 310 logic to P ^ * — 

take advantage of existing on-chip DRAM 311 circuitry. "-""J 
Row and column design is inherent in memory architecture. r*^--v- 
*</Lr- The DRAMs 311 access random bits in a memory array by SS^fe 

l^v, 25 first selecting a row of 1024 bits, storing them into a " ^ 

t-^' column latch, and then selecting one of the bits as the k*.*^ 

data to be read or written. 

The time required to access the data is split between 
• the row access and the column access. Selecting data 

30 already stored in a column latch is faster than selecting ' 
a random bit by at least a factor of six. The 
microprocessor 310 takes advantage of this high speed by -}\ 
creating a number of column latches and using them as 
caches and shift registers. Selecting a new row of iT^T*? 
35 information may be thought of as performing a 1024-bit 
read or write with the resulting immense bus bandwidth. 
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1. The microprocessor 5\ trrats 
-^ ^Tx instruction register 108 (see Figures 2 and 4) as a cache 

5~ ""-Z~JL.* , . .. for four 8-bit instructions. Since the DRAM 311 maintains 
prJri ~ : - , a 1024-bit latch for the column bits, the microprocessor l^^%>*kff£ 

5 310 treats the column latch as a cache for 128 8-bit '^^k^¥i:^4M 
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instructions. Therefore, the next instruction will almost rt 
always be already present in the cache. Long loops within 7 " T * ~V 
the cache are also possible and more useful than the 4 
instruction loops in the microprocessor 50. SpJ^ 
10 2. The microprocessor 50 uses two 16 x 3 2 -bit deep " 

register arrays 74 and 134 (Figure 2) for the parameter , J^^- 

stack and the return stack. The microprocessor 310 ^Ov> 
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creates two other 1024-bit column latches to provide the ^-^ 
equivalent of two 32 X 32-bit arrays, which can be 
15 accessed twice as fast as a register array. 

3. The microprocessor 50 has a DMA capability which y, 
can be used for I/O to a video shift register. The . , ., v * > 

microprocessor 310 uses yet another 1024-bit column latch 
as a long video shift register to drive a CRT display 
20 directly. For color displays, three on-chip shift :v^/? 
. registers could also be used. These shift registers can _ .. ♦ 

transfer pixels at a maximum of 100 MHz. 
C /^ 4. The microprocessor 50 accesses memory via an :s; 

~ r ;.Vs external 32-bit bus. Most of the memory 311 for the :<: . r 

|$__ 25 microprocessor 310 is on the same die 312. External 

access to more memory is made using an 8-bit bus. The ' :^;> 

result is a smaller die, smaller package and lower power - -> 

consumption than the microprocessor 50. 
• 5. The microprocessor 50 consumes about a third of J~ 

' * 30 its operating power charging and discharging the I/O pins >1 

and associated capacitances. The DRAMs 15Q (Figure 8) ' ; 
connected to the microprocessor 50 dissipate most of their 
0""* ' power in the I/O drivers. A microprocessor 310 system ^~ 



will consume about one-tenth the power of a microprocessor f^o^T 
35 50 system, since having the DRAM 311 next to the processor 

310 eliminates most of the external capacitances to be l-^k 
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charged and discharge 

6. Multiprocessing means splitting a computing tasktStfe^f?:' 
between numerous processors in order to speed J up the ~ r 
solution. The popularity of multiprocessing is limited by'"'"*"*" 
5 the expense of current individual processors as well as 
the limited interprocessor communications ability. The 
microprocessor 310 is an excellent multiprocessor 
candidate, since the chip 312 is a monolithic computer 
complete with memory, rendering it low-cost and physically 
10 compact* 

The shift registers implemented with the ^ v " 
microprocessor 310 to perform video output can also be ^ 
configured as interprocessor communication links. The 
INMOS transputer attempted a similar strategy, but at much 
lower speed and without the performance benefits inherent 
in the microprocessor 310 column latch architecture. 
Serial I/O is a prerequisite for many multiprocessor 
topologies because of the many neighbor processors which 
communicate. A cube has 6 neighbors. Each neighbor 
20 communicates using these lines: 
DATA IN 
CLOCK IN 
READY FOR DATA 
DATA OUT 
25 DATA READY? 

CLOCK OUT 

A special start up sequence is used to initialize the on- 
chip DRAM 311 in each of the processors. 

The microprocessor 310 column latch architecture 

30 allows neighbor processors to deliver information directly 
to internal registers or even instruction caches of other 
chips 312. This technique is not used with existing 
processors, because it only improves performance in a 
tightly coupled DRAM system. 

35 7 * The microprocessor 50 architecture offers two 

types of looping structures: LOOP-IF-DONE and MICRO-LOOP. 
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The former takes an 8-bit to 24-bit operand to describe 
the entry point to the loop address. The latter performs' 
a loop entirely within the 4 instruction queue and the 
loop entry point is implied! as the first instruction in 
5 the queue. Loops entirely within the queue run without 
external instruction fetches and execute up to three times 
as fast as the long loop construct. The microprocessor 
310 retains both constructs with a few differences. The 
microprocessor 310 microloop functions in the same fashion 

10 as the microprocessor 50 operation, except the queue is 
1024-bits or 128 8-bit instructions long. ^ The" 
microprocessor 310 microloop can therefore contain jumps,^ 
branches, calls and immediate operations not possible in 
the 4 8-bit instruction microprocessor 50 queue. 

15 Microloops in the microprocessor 50 can only perform 

simple block move and compare functions. The larger 
microprocessor 310 queue allows entire digital signal 
processing or floating point algorithms to loop at high 
speed in the queue. 

20 The microprocessor 50 offers four instructions to 

redirect execution: 
CALL 
BRANCH 

BRANCH-IF-ZERO 

25 L00P-IF-N0T-D0NE 

These instructions take a variable length address operand 
6, 16 or 24 bits long. The microprocessor 50 next address 
logic treats the three operands similarly by adding or 
subtracting them to the current program counter. For the 

30 microprocessor 310, the 16 and 24-bit operands function in' 
the same manner as the 16 and 24-bit operands in the 
microprocessor SO- The 8-bit class operands are reserved 
to operate entirely within the instruction queue. Next 
address decisions can therefore be made quickly, because 

35 only 10 bits of addresses are affected, rather than 32. 
There is no carry or borrow generated past the 10 bits. 
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8. The microprocessor 310 CPU 316 resides on an 
already crowded DRAM die 312. To keep chip size as small 
as possible, the DMA processor 72 of the microprocessor 50 
has been replaced with a more traditional DMA controller 
314* DMA is used with the microprocessor 310 to perform 
the following functions: 

Video output to a CRT 

Multiprocessor serial communications 

8-bit parallel I/O 
The DHA controller 314 can maintain both serial and 
parallel transfers simultaneously. The following DMA' 
sources and destinations are supported by the" 
microprocessor 310: 

DESCRIPTION I/O LINES 

1. Video shift register OUTPUT 1 to 3 

2. Multiprocessor serial BOTH 6 lines/ channel 

3. 8-bit parallel BOTH 8 data, 4 control 
The three sources use separate 102 4 -bit buffers and 
separate I/O pins. Therefore, all three may be active 
simultaneously without interference. 

The microprocessor 310 can be implemented with either 
a single multiprocessor serial buffer or separate receive 
and sending buffers for each channel, allowing 
simultaneous bidirectional communications with six 
neighbors simultaneously. 

Figures 10 and 11 provide details of the PROM DMA 
used in the microprocessor 50. The microprocessor 50 
executes faster than all but the fastest PROMs. PROMS are 
used in a microprocessor 50 system to store program 
segments and perhaps entire programs. The microprocessor 
50 provides a feature on power-up to allow programs to be 
loaded from low-cost, slow speed PROMs into high speed 
DRAM for execution. The logic which performs this 
function is part of the DMA memory controller 118. The 
operation is similar to DMA, but not identical, since four 
8-bit bytes must be assembled on the microprocessor 50 
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chip, then written to the DRAM 150. 

\ The microprocessor 50 directly interfaces to DRAM 150 
over a triple multiplexed data and address bus 350, which 
carries RAS addresses, CAS addresses and data* The EPROM 
5 260, on the other hand, is read with non-multiplexed 
busses. The microprocessor 50 therefore has a special 
mode which unmultiplexes the data and address lines to 
read 8 bits of EPROM data. Four 8-bit bytes are read in 
this fashion* The multiplexed bus 350 is turned bacX on, 
10 and the data is written to the DRAM 150. , 

When the microprocessor 50 detects a RESET condition, 
the processor stops the main CPU 70 and forces a mode 0 
(PROM LOAD) instruction into the DMA CPU 72 instruction 
register. The DMA instruction directs the memory 
15 controller to read the EPROM 260 data at 8 times the 
ncrmal access time tor memory. Assuming a 50 MHz 
microprocessor 50, this means an access time of 320 nsec. 
The instruction also indicates: 

The selection address of the EPROM 260 to be loaded, 
20 The number , of 32-bit words to transfer, 

The DRAM 150 address to transfer into. 
The sequence of activities to transfer one 32-bit 
word from EPROM 260 to DRAM 150 are: 

1. RAS goes low at 352, latching the EPROM 260 
25 select information from the high order address bits. 

The EPROM 260 is selected, 

2. Twelve address bits (consisting of what is 
normally DRAM CAS addresses plus two byte select bits 
are placed on the bus 350 going to the EPROM 260 

30 address pins. These signals will remain on the lines 

until the data from the EPROM 260 has been read into 
the microprocessor 50. For the first byte, the byte 
select bits will be binary 00. 

3. CAS goes low at 354, enabling the EPROM 260 data 
35 onto the lower 8 bits of the external address/data 

bus 350. NOTE: It is important to recognize that, 

A-50412/WEH 





iz 

m 



\ » i 



* 



10 



15 



20 



25 



30 



during this part of the cycle, the lover 8 bits of 
the external data/address bus are functioning J as ^ _ 
* inputs, "but" the' rest Vf^the^bus^is^ still" arti^M^^^^^ 

outputs. ' " r /"/ r " :i _/l\Ll*L_ ^ 

4. The microprocessor 50 latches these eight least J; 
significant bits internally and shifts them 8 bits : . / 
left to shift them to the next significant "™hfce'y~^~**'' r 7 
position. 

5. Steps 2 j 3 and 4 are repeated with byte address 4; ^ ^ 

'01.- " ,-* v ' ~ " "-v v v! r * J 

6. Steps 2, 3 and 4 are repeated with ^e address r /; ^ 
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10. - " - - - ~; - - 

7. Steps 2, 3 and 4 are repeated with byte address 
11. 

8. CAS goes high at 356, taking the EPROM 260 off 
the data bus. 

9. RAS goes high at 358, indicating the end of the 
EPROM 260 access. 

10. RAS goes low at 360, latching the DRAM select 
information from the high order address bits. At the 
same time, the RAS address bits are latched into the 
DRAM 155. The DRAM 150 is selected. 

11. CAS goes low at 362, latching the DRAM 150 CAS 
addresses. 

12. The microprocessor 50 places the previously 
latched EPROM 260 32-bit data onto the external 
address/data bus 350. W goes low at 364, writing the 
32 bits into the DRAM 150. 

13. W goes high at 366. CAS goes high at 368. The 
process continues with the next word. 

Figure 12 shows details of the microprocessor 50 
memory controller 118. In operation, bus requests stay 
present until they are serviced. CPU 70 requests are 
prioritized at 370 in the order of: 1, Parameter Stack; 2, 
Return Stack; 3, Data Fetch; 4, Instruction Fetch. The 
resulting CPU request signal and a DMA request signal are 
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supplied as bus _ requests to ./bus control 372, which 
provides a bus grant signal at 374. * Internal address bus 
136 and a DMA counter 376 provide inputs to a multiplexer 
373* Either a row address or a column address are 
5 provided as an output to multiplexed address bus 380 as an 
output from the multiplexer 378* The multiplexed address 
bus 380 and the internal data bus 90 provide address and 
data inputs, respectively, to multiplexer 382. Shift 
register 384 supplies row address strobe (RAS) 1 and 2 

10 control signals to .multiplexer 386 and column' address 
strobe (CAS) 1 and 2 control signals to multiplexer 388 on 
lines 390 and 392. The shift register 384 also supplies 
output enable (OE) and write (W) signals on lines 394 and 
396 and a control signal on line 398 to multiplexer 382. 

15 The shift register 384 receives a RUN signal on line 400 
to generate a memory cycle and supplies a MEMORY READY 
signal on line 402 when an access is complete. 
STACK/REGISTER ARCHITECTURE 

Most microprocessors use on-chip registers for 

20 temporary storage of variables. The on-chip registers 
access data faster than off-chip RAM. A few 
microprocessors use an on-chip push down stack for 
temporary storage. 

A stack has the advantage of faster operation 

25 compared to on-chip registers by avoiding the necessity to 
select source and destination registers. (A math or logic 
operation always uses the top two stack items as source 
and the top of stack as destination.) The stack's 
disadvantage is that it makes some operations clumsy. 

30 Some compiler activities in particular require on-chip 
registers for efficiency. 

As shown in Figure 13, the microprocessor 50 provides 
both on-chip registers 134 and a stack 74 and reaps the 
benefits of both. 

35 BENEFITS: 

1. Stack math and logic is twice as fast as those 
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available on an equivalent register only 
machine. Most programmers and optimizing 
compilers can take advantage of this feature. * 
2. Sixteen registers are available for on-chip 
5 storage of local variables which can transfer 

to the stack for computation. The accessing of 
variables is three to four times as fast as 
available on a strictly stack machine. 
The combined stack 74/register 134 architecture has 
10 not been used previously due to inadequate understanding 
by computer designers of optimizing compilers and .the mix 
of transfer versus math/logic instructions. * ; 

ADAPTIVE MEMORY CONTROLLER 

A microprocessor must be designed to work with small 
15 or large memory configurations. As more memory loads are 
added to the data, address, and control lines, the 
switching speed of the signals slows down. The 
microprocessor 50 multiplexes the address/data bus three 
ways, so timing between the phases is critical. A 
20 traditional approach to the problem allocates a wide 
margin of time between bus phases so that systems will 
work with small or large numbers of memory chips 
connected. A speed compromise of as much as 50% is 
required. 

25 As shown in Figure 14, the microprocessor 50 uses a 

feedback technique to allow the processor to adjust memory 
bus timing to be fast with small loads and slower with 
large ones. The OUTPUT ENABLE (OE) line 152 from the 
microprocessor 50 is connected to all memories 150 on the 

30 circuit board. The loading on the output enable line 152 
to the microprocessor 50 is directly related to the number 
of memories 150 connected. By monitoring how rapidly OE 
152 goes high after a read, the microprocessor 50 is able 
to determine when the data hold time has been satisfied 

35 and place the next address on the bus. 

The level of the OE line 152 is monitored by CMOS 
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input buffer 410 which generates ah internal READY signal J? 
on line 412 to the microprocessor's memory controller. 
Curves 414 and 416 of the Figure 15 graph show the 
difference in rise time likely to be encountered from a 
lightly to heavily loaded memory system. When the OE line/?A 
152 has reached a predetermined level to generate the 
READY signal, driver 418 generates an OUTPUT ENABLE signal ^ 
on OE line 152. ^ , 

SKIP WITHIN THE INSTRUCTION CACHE ; -V- ? '~- 

The microprocessor 50 fetches four 8-bit instructions ^ 
each memory cycle and stores them in a 32-bit instruction 
register 108, as shown in Figure 16. A class of "test and 
skip" instructions can very rapidly execute a very fast 
jump operation within the four instruction cache. 
SKIP CONDITIONS: 
Always 

ACC non-zero 
ACC negative 

Carry flag equal logic one 
Never 

ACC equal zero 
ACC positive 

Carry flag equal logic zero - - „ 

The SKIP instruction can be located in any of the four 
byte positions 420 in the 32-bit instruction register 108. 
If the test is successful, SKIP will jump over the 
remaining one, two, or three 8-bit instructions in the 
instruction register 108 and cause the next 
four-instruction group to be loaded into the register 108. 
As shown, the SKIP operation is implemented by resetting 
the 2-bit microinstruction counter 180 to zero on line 422 
and simultaneously latching the next instruction group 
into the register 108. Any instructions following the 
SKIP in the instruction register are overwritten by the 
new instructions and not executed. 

The advantage of SKIP is that optimizing compilers 
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and smart programmers can often use it in place of 
longer conditional JUMP , instruction • ^.^ SKI?,, also makes 
possible microloops which exit when the loop counts down 
or when the SKIP jumps to. the next instruction group. The 
result is very fast code. ^ ^^^-v "K&vi : kSS 
Other machines (such as the PDP-8 and Data General 
NOVA) provide the ability to skip a single instruction. 
The microprocessor 50 provides the ability . to skip up .to 
three instructions. 

MICROLOOP IN THE INSTRUCTION CACHE , 

The microprocessor 50 provides the MICROLOOP 
instruction to execute repetitively from one to three 
instructions residing in the instruction register 108. 
The microloop instruction works in conjunction with the 
LOOP COUNTER 92 (Figure 2) connected to the internal data 
bus 90. To execute a microloop, the program stores a 
count in LOOP COUNTER 92. MICROLOOP may be placed in the 
first, second, third, or last byte 420 of the instruction 
register 108*' If placed in the first position, execution 
will just create a delay equal to the number stored in 
LOOP COUNTER 92 times the machine cycle. If placed in the 
second, third, or last byte 420, when the microloop 
instruction is executed, it will test the LOOP COUNT for 
zero. If zero, execution will continue with the next 
instruction. If not zero, the LOOP COUNTER 92 is 
decremented and the 2-bit microinstruction counter is 
cleared, causing the preceding instructions in the 
instruction register to be executed again. 

Microloop is useful for block move and search 
operations. By executing a block move completely out of 
the instruction register 108, the speed of the move is 
doubled, since all memory cycles are used by the move 
rather than being shared with instruction fetching. 
Such a hardware implementation of microloops is much 
faster than conventional software implementation of a 
comparable function. 
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OPTIMAL CPU CLOCK SCHEME 

The designer of a high speed microprocessor must 
produce a product which operate over vide temperature 
ranges, vide voltage swings, and vide variations in 
5 semiconductor processing. Temperature, voltage, and 
process all affect transistor propagation delays* 
Traditional CPU designs are done so that vith the worse 
case of the three parameters, the circuit will function at 
the rated clock speed. The result are designs that must 

10 be clocked a factor of two slower than their maximum 
theoretical performance, so they will operate properly in 
worse case conditions. 

The microprocessor 50 uses the technique shown in 
Figures 17-19 to generate the system clock and its 

15 required phases. Clock circuit 430 is the familiar "ring 
oscillator" used to test process performance. The clock 
is fabricated on the same silicon chip as the rest of the 
microprocessor 50. 

The ring oscillator frequency is determined by the 

20 parameters of temperature, voltage, and process. At room 
temperature, the frequency will be in the neighborhood of 
100 MHZ. At 70 degrees Centigrade, the speed will be 50 
MHZ. The ring oscillator 430 is useful as a system clock, 
with its stages 431 producing phase 0-phase 3 outputs 433 

25 shown in Figure 19, because its performance tracks the 
parameters which similarly affect all other transistors on 
the same silicon die. By deriving system timing from the 
ring oscillator 430, CPU 70 will always execute at the 
maximum frequency possible, but never too fast. For 

3 0 example, if the processing of a particular die is not good 
resulting in slow transistors, the latches and gates on 
the microprocessor 50 will operate slower than normal. 
Since the microprocessor 50 ring oscillator clock 430 is 
made from the same transistors on the same die as the 

35 latches and gates, it too will operate slower (oscillating 
at a lower frequency) , providing compensation which allows 
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the rest of the chip's logic to operate properly. 
ASYNCHRONOUS/SYNCHRONOUS CPU 

Most microprocessors derive all system timing from a 
single clock. The disadvantage is that different parts of 
5 the system can slow all operations. The microprocessor 50 
provides a dual-clock scheme as shown in Figure 17, with 
the CPU 70 operating asynchronously to I/O interface 432 
forming part of memory controller 118 (Figure 2} and the 
I/O interface 432 operating synchronously with the 

10 external world of memory and I/O devices. The CPU 70 
executes at the fastest speed possible using the adaptive 
ring counter clock 430, Speed may vary by a factor of ^ 
four depending upon temperature, voltage, and process.^; 
The external world must be synchronized to^* the 

15 microprocessor 50 for operations such as video display 
updating and disc drive reading and writing. This 
synchronization is performed by the I/O interface 432, 
speed of which is controlled by a conventional crystal 
clock 434. The interface 432 processes requests for 

20 memory accesses from the microprocessor 50 and 
acknowledges the presence of I/O data. The microprocessor 
50 fetches up to four instructions in a single memory 
cycle and can perform much useful work before requiring 
another memory access. By decoupling the variable speed 

2 5 of the CPU 70 from the fixed speed of the I/O interface 
432, optimum performance can be achieved by each. 
Recoupling between the CPU 70 and the interface 432 is 
accomplished with handshake signals on lines 436, with 
data/addresses passing on bus 90, 136. 

30 ASYNCHRONOUS/ SYNCHRONOUS CPU IMBEDDED ON A DRAM CHIP 

System performance is enhanced even more when the 
DRAM 311 and CPU 314 (Figure 9) are located on the same 
die. The proximity of the transistors means that 
DRAM 311 and CPU 314 parameters will closely follow each 

35 other. At room temperature, not only would the CPU 314 
execute at 100 MHZ, but the DRAM 311 would access fast 
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enough to keep up. The synchronization performed by the* >i">,'> 
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I/O interface 432 would be for DMA and reading and writing ... 
I/O ports* In some systems (such as calculators) no I/O ;»jp^z--~'*-&*s. 
synchronization at all would be required, and the I/O • 
clock would be tied to the ring counter clock* t^sSS^ 
VARIABLE WIDTH OPERANDS 



Many microprocessors provide variable width operands. 
The microprocessor 50 handles operands of*8 # 16, cr 24 
bits using the same op-code. Figure 20 shows the 32-bit 

10 instruction register 108 and the 2-bit microinstruction 
register 180 which selects the 8-bit instruction. Two 
classes of microprocessor 50 instructions can be greater 
than 8-bits, JUMP class and IMMEDIATE./"^ ^ A J JUMP i or ; 
IMMEDIATE op-code is 8-bits, but the operand can be 8, 16, 

15 or 24 bits long, This magic is possible because operands 
must be right justified in the instruction register. This 
means that the least significant bit of the operand is 





always located in the least significant bit of the i ^iS/i/ 
instruction register. The microinstruction counter 180 ^-^>v' 
20 selects which 8 -bit instruction to execute. If a JUMP or ; 

IMMEDIATE instruction is decoded, the state of the 2-bit ■ . * 

microinstruction counter selects the required 8 , 16, or 24 
bit operand onto the address or data bus. The unselected 
8-bit bytes are loaded with zeros by operation of 
25 decoder 440 and gates 442. The advantage of this . 
technique is the saving of a number of op-codes 

required to specify the different operand sizes in other L'^C 
microprocessors. 

TRIPLE STACK CACHE ^» 
30 Computer performance is directly related to the 

system memory bandwidth * The faster the memories, the . v 

faster the computer. Fast memories are expensive, so r \ } ' ^ 

techniques have been developed to move a small amount of t-ii^-V- 
high-speed memory around to the memory addresses where it \v 
35 is needed. A large amount of slow memory is constantly ri c A£f£ 
updated by the fast memory, giving the appearance of a '*~^Vv 
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large fact memory array* A common implementation of the 
technique is known as a high-speed memory cache. The 
cache a*j be thought of as fast acting shock absorber 
smoothie out the bumps in memory access* When more 
5 memory Ik required than the shock can absorb, it bottoms 
out and slow speed memory is accessed. Most memory 
operations can be handled by the shock absorber itself. 

The microprocessor 50 architecture has the ALU 80 
(Figure 2} directly coupled to the top two stack 
10 locations 76 and 78. The access time of the stack 74 
therefor* directly affects the execution speed of the 
processor. The microprocessor 50 stack architecture is — - 
particularly suitable to a triple cache technique, shown^^;^-^ 
in Figure 21 which offers the appearance of a large stack 
15 memory operating at the speed of on-chip latches 450. 

Latches *50 are the fastest form of memory device built on f 
the chip, delivering data in as little as 3 nsec. However 
latches 450 require large numbers of transistors to 
construct. On-chip RAM 452 requires fewer transistors 
20 than latches, but is slower by a factor of five (15 nsec 
access). Off-chip RAM 150 is the slowest storage of all. 
The microprocessor 50 organizes the stack memory hierarchy 
as three interconnected stacks 450, 452 and 454. The 
latch stack 450 is the fastest and most frequently used. 
25 The on-ctip RAM stack 4 52 is next. The off -chip RAM stack 
454 is slowest. The stack modulation determines the 
effectiw access time of the stack. If a group of stack 
operations never push or pull more than four consecutive 
items on the stack, operations will be entirely performed 
30 in the 3 nsec latch stack. When the four latches 456 are 
filled, the data in the bottom of the latch stack 450 is 
written to the top of the on-chip RAM stack 452. When the 
sixteen locations 458 in the on-chip RAM stack 452 are 
filled, the data in the bottom of the on-chip RAM stack 
35 452 is written to the top of the off-chip RAM stack 454. 

When popping data off a full stack 450, four pops will be 
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performed before stack empty line 460 Trom the latch stack 
pointer 462 transfers data from the on-chip RAM stack 452. 
By waiting for the latch stack 450 to empty before 
performing the slower on-chip RAM access, the high 
5 effective speed of the latches 456 are made available to 
the processor. The same approach is employed with the on- 
chip RAM stack 452 and the off-chip RAM stack 454. 
POLYNOMIAL GENERATION INSTRUCTION ~ _ ~ . ; * 

Polynomials are useful for . error correction, 

10 encryption, data compression, and fractal generation. A 
polynomial is generated by a sequence of shift and 
exclusive OR operations. Special chips are provided for 
this purpose in the prior art. "\ . Tff7^^ff^f:y 

The microprocessor 50 is able to generate polynomials 

15 at high speed without external hardware by . slightly 
modifying how the ALU 80 works. As shown in Figure 21, a 
polynomial is generated by loading the "order" (also known 
as the feedback terms) into C Register 470. The value 
thirty one (resulting in 32 iterations) is loaded into 

20 DOWN COUNTER 472. A register 474 is loaded with zero. B 
register 476 is loaded with the starting polynomial value. 
When the POLY instruction executes, c register 470 is 
exclusively ORed with A register 474 if the least 
significant bit of B register 476 is a one. Otherwise, 

25 the contents of the A register 474 passes through the ALU 
80 unaltered. The combination of A and B is then 
shifted right (divided by 2) with shifters 478 and 480. 
The operation automatically repeats the specified number 
of iterations, and the resulting polynomial is left in A 

30 register 474. 

FAST MULTIPLY 

Most microprocessors offer a 16 X 16 or 32 X 32 bit 
multiply instruction. Multiply when performed 
sequentially takes one shift/add per bit, or 32 cycles for 
35 32 bit data. The microprocessor 50 provides a high speed 
multiply which allows multiplication by small numbers 
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using only a small number of cycles. Figure 23 shows "the 
logic used to implement the high speed algorithm. To 
perform a multiply, the size of the multiplier less one 
is placed in the DOWN COUNTER 472- For a four bit 
5 multiplier, the number three would be stored in the DOWN 
COUNTER 472. Zero is loaded into the A register 474. The 
multiplier is written bit reversed into the B Register 
476. For example, a bit reversed five (binary 0101) would 
be written into B as 1010. The multiplicand is written 
10 into the C register 470. Executing the FAST MULT 
instruction will leave the result in the A Register 474, 
when the count has been completed. The fast multiply 
instruction is important because many applications scale 
one number by a much smaller number. The difference in 
15 speed between multiplying a 32 X 32 bit and a 32 X 4 bit 
is a factor of 8. If the least significant bit of the 
multiplier is a "ONE", the contents of the A register 474 
and the C register 470 are added. If the least 
significant bit of the multiplier is a "ZERO", the 
20 contents of the A register are passed through the ALU 80 
unaltered. The output of the ALU 80 is shifted left by 
shifter 482 in each iteration. The contents of the B 
register 476 are shifted right by the shifter 480 in each 
iteration. 
25 INSTRUCTION EXECUTION PHILOSOPHY 

The microprocessor 50 uses high speed D latches in 
most of the speed critical areas. Slower on-chip RAM is 
used as secondary storage. 

The microprocessor 50 philosophy of instruction 
30 execution is to create a hierarchy of speed as follows: 
Logic and D latch transfers 1 cycle 20 nsec 
Math 2 cycles 4 0 nsec 

Fetch/store on-chip RAM 2 cycles 40 nsec 

Fetch/store in current RAS page 4 cycles 80 nsec 
35 Fetch/store with RAS cycle n cycles 220 nsec 

With a 50 MHZ clock, many operations can be performed in 



mm 
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20 nsec. and almost everything else in 40 nsec* 

To maximize speed, certain techniques in processor 
design have been used. They include: 

Eliminating arithmetic operations on addresses, 
5 Fetching up to four instructions per memory cycle, 

Pipelineless instruction decoding 
Generating results before they are needed, 
Use of three level stack caching, 
PIPELINE PHILOSOPHY 
10 . Computer instructions are usually broken down into 

sequential pieces, for example: fetch, decode, register 
read, execute, and store. Each piece will ^require a 
single machine cycle. In most Reduced Instruction Set 
Computer (RISC) chips, instruction require from three to 
15 six cycles. 

RISC instructions are very parallel. For example, 
each of 70 different instructions in the SPARC (SUN 
Computer's RISC chip) has five cycles. Using a technique 
called "pipelining", the different phases of consecutive 
20 instructions can be overlapped. 

To understand pipelining, think of building five 
residential homes. Each home will require in sequence, a 
foundation, framing, plumbing and wiring, roofing, and 
interior finish. Assume that each activity takes one 
25 week. To build one house will take five weeks. 

But what if you want to build an entire subdivision? 
You have only one of each work crew, but when the 
foundation men finish on the first house, you immediately 
start them on the second one, and so on. At the end of 
30 five weeks, the first home is complete, but you also have 
five foundations. If you have kept the framing, plumbing, 
roofing, and interior guys all busy, from five weeks on, a 
new house will be completed each week. 

This is the way a RISC chip like SPARC appears to 
35 execute an instruction in a single machine cycle. In 
reality, a RISC chip is executing one fifth of five 
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instruction^each machine cycle. And if five instructions 
stay in sequence, an instruction will be completed etch 
machine cycle, ^ : : 

The problems with a pipeline are keeping the pipe 
5 full with instructions. Each time an out of sequence 
instruction such as a BRANCH or CALL occurs, the pipe must 
be refilled with the next sequence* The resulting dead 
time to refill the pipeline can become substantial when 
many IF/ THEN/ ELSE statements or subroutines are 

10 encountered. 

THE PIPELINE APPROACH 

The microprocessor 50 has no pipeline as such. The 
approach of this microprocessor to speed is to overlap 
instruction fetching with execution of the previously 

15 fetched instruction (s) . Beyond that, over half the 
instructions (the most common ones) execute entirely in a 
single machine cycle of 20 nsec. This is possible 
because : 

1. Instruction decoding resolves in 2.5 nsec. 
20 2. Incremented/decremented and some math values are 

calculated before they are needed, requiring only a 
latching signal to execute. 

3. Slower memory is hidden from high speed operations 
by high-speed D latches which access in 4 nsec. 
25 The disadvantage for this microprocessor is a more 
complex chip design process. The advantage for the chip 
user is faster ultimate throughput since pipeline 
stalls cannot exist. Pipeline synchronization with 
availability flag bits and other such pipeline handling is 
30 not required by this microprocessor. 

For example, in some RISC machines an instruction 
which tests a status flag may have to wait for up to four 
cycles for the flag set by the previous instruction to be 
available to be tested. Hardware and software debugging 
35 is also somewhat easier because' the user doesn't have to 
visualize five instructions simultaneously in the pipe. 
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OVERLAPPING INSTRUCTION FETCH/EXECUTE 

The slovest procedure the microprocessor 50 performs 
is to access memory* Memory is accessed when data is read 
or written* Kemory is also read when instructions are 
fetched* The microprocessor 50 is able to hide fetch of 
the next instruction behind the execution of the 
previously fetched instruction (s) . The microprocessor 50 
fetches instructions in 4 -byte instruction groups. - An 
instruction group may contain from one / to four- 
instructions. The amount of time required to execute the 
instruction group ranges from 4 cycles for simple 
instructions to 64 cycles for a multiply. 

When a new instruction group is fetched, the 
microprocessor instruction decoder looks at the most 
significant bit of all four of the bytes. The most 
significant bit of an instruction determines if a memory 
access is required. For example, CALL. FETCH, and STORE 
all require a memory access to execute. If all four bytes 
have nonzero most significant bits, the microprocessor 
initiates the memory fetch of the next sequential 4 -byte 
instruction group. When the last instruction in the group 
finishes executing, the next 4 -byte instruction group is 
ready and waiting on the data bus needing only to be 
latched into the instruction register. If the 4-byte 
instruction group required four or more cycles to execute 
and the next sequential access was a column address strobe 
(CAS) cycle, the instruction fetch was completely 
overlapped with execution. 
INTERNAL ARCHITECTURE 

The microprocessor 50 architecture consists of the 
following: 

PARAMETER STACK < — > 



ALU* 



< 32 BITS > 

16 DEEP 



Y REGISTER 
RETURN STACK 

< 32 BITS > 

16 DEEP 
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Used for math and logic. 



10 



Push down stack. 
Can overflow into 
of f -chip RAM. 



Used for subroutine 
and interrupt return 
addresses as veil as 
local variables. 

Push down stack. " 
Can overflow into 
off-chip RAH. ^ 
Can also be accessed 
relative to top of w , 
stack. / '^- /v: ;^ 



rot 



15 



LOOP COUNTER 



(32-bits, can decrement by 1) 
Used by class of test and loop 
instructions. 



5^: 



fa 



X REGISTER (32-bits, can increment or decrement by 

4). Used to point to RAM locations. 
20 PROGRAM COUNTER (32-bits, increments by 4) . Points to 

4 -byte instruction groups in RAM* 
INSTRUCTION REG (32-Bits) . Holds 4-byte instruction 
groups while they are being decoded 
and executed. 



25 



30 



* Math and logic operations use the TOP item and 
NEXT to top Parameter Stack items as the 
operands. The result is pushed onto the 
Parameter Stack. 



* Return addresses from subroutines are placed 
on the Return Stack* The Y REGISTER is used as 
a pointer to RAM locations* Since the Y 
REGISTER is the top item of the Return Stack, 
35 nesting of indices is. straightforward. 
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MODE - A register with mode and status bits* 



10 



15 



20 



25 



30 
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MODE-BITS: 

- Slow down memory accesses by 8 if "l". Run full 
speed if "0"* (Provided for access to slow EPROM.) 

- Divide the system clock by 1023 if "1* to reduce 
power consumption. Run full speed if *0*. (On-chip 
counters slow down if this bit is set.) 

- Enable external interrupt I. ^ f . v . V: 

- Enable external interrupt 2. 

- Enable external interrupt 3. . 

- Enable external interrupt 4. / \" 

- Enable external interrupt 5. 

- Enable external interrupt 6. 

- Enable external interrupt 7. 



ON-CHIP MEMORY LOCATIONS: 
MODE-BITS 
DMA- POINTER 
DMA-COUNTER 
STACK- POINTER 
STACK-DEPTH 
RSTACK-POINTER - 
RS TACK- DEPTH 



Pointer into Parameter Stack. 
Depth of on-chip Parameter Stack 
Pointer into Return Stack 
Depth of on-chip Return Stack 



ADDRESSING MODE HIGH POINTS 

The data bus is 32-bits wide. All memory fetches and 
stores are 32-bits. Memory bus addresses are 30 bits. 
The least significant 2 bits are used to select 
one-of-four bytes in some addressing modes. The Program' 
Counter, X Register, and Y Register are implemented as D 
latches with their outputs going to the memory address 
bus and the bus incrementer/decrementer. Incrementing one 
of these registers can happen quickly, because the 
incremented value has already rippled through the inc/dec 
logic and need only be clocked into the latch. Branches 
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and Calls are made to 3 2 -bit word boundaries. 



INSTRUCTION SET 

32-BIT INSTRUCTION FORMAT ^ , p ^ ^ _ ...J „ . 

The thirty two bit instructions are CALL, BRANCH, 
BRANCH-IF-ZERO, and LOOP-IF-NOT-DONE . These instructions 
require the calculation of an effective address. In many 
computers, the effective address is calculated by adding 
or subtracting an operand .with the current Program 
Counter. This math operation requires from four to seven 
machine cycles to perform and can definitely bog down 
machine execution. The microprocessor's strategy is to 
perform the required math operation at assembly or linking 
time and do a much simpler ••Increment to next page" or 
"Decrement to previous page** operation at run time. As a 
result, the microprocessor branches execute in a single 
cycle. 

24-BIT OPERAND FORM: 

Byte 1 Byte 2 Byte 3 Byte 4 

WWWWWW XX - YYYYYYYY - YYYYYYYY - YYYYYYYY 
With a 24 -bit operand, the current page is 
considered to be defined by the most 
significant 6 bits of the Program Counter. 
16-BIT OPERAND FORM: 

QQQQQQQQ - WWWWWW XX - YYYYYYYY - YYYYYYYY 
With a 16-bit operand, the current page is 
considered to be defined by the most 
significant 14 bits of the Program Counter. 
8-BIT OPERAND FORM: 

QQQQQQQQ - QQQQQQQQ - WWWWWW XX - YYYYYYYY 
With an 8 -bit operand, the current page is 
considered to be. defined by the most 
significant 22 bits of the Program Counter. 
QQQQQQQQ - Any 8-bit instruction. 
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WWWWWW - Instruction op-code. 
XX - Select how the address bits will be used: 

00 ~ Make all high-order bits zero. (Page zero 

addressing) 

5 01 - Increment the high-order bits. (Use next page) 

10 - Decrement the high-order bits. (Use previous 

page) 

11 - Leave the high -order bits unchanged. (Use 
current page) 

10 YYYYYYYY * The address operand field- This field is 
always shifted left two bits (to generate a word rather 
than byte address) and loaded into the Program Counter. 
The microprocessor instruction decoder figures out the 
width of the operand field by the location of the 

15 instruction op-code in the four bytes. 

The compiler or assembler will normally use the 
shortest operand required to reach the desired address so 
that the leading bytes can be used to hold other 
instructions. The effective address is calculated by 

20 combining: 

The current Program Counter, 

The 8, 16, or 24 bit address operand in the 
instruction , Using one of the four allowed addressing 
modes . 

25 

EXAMPLES OF EFFECTIVE ADDRESS CALCULATION 

EXAMPLE 1: 

Byte 1 Byte 2 Byte 3 Byte 4 
QQQQQQQQ QQQQQQQQ 00000011 1C011000 
30 The "QQQQQQQQs" in Byte 1 and 2 indicate space in the' 

4 -byte memory fetch which could be hold two other 

instructions to be executed prior to the CALL instruction. 

Byte 3 indicates a CALL instruction (six zeros) in the 

current page (indicated by the 11 bits) * Byte 4 indicates 
35 that the hexadecimal number 98 will be forced into the 

Program Counter bits 2 through 10. (Remember* a CALL or 
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BRANCH always goes to a word boundary so the two least 
significant bits are always set to xero) . The effect of 
this instruction would be to CALL a subroutine at WORD 
location HEX 98 in the current page. The most significant 
22 bits of the Program Counter define the current page and 
will be unchanged. - : 

EXAMPLE 2: 

Byte 1 Byte 2 Byte 3 Byte 4 
000001 01 00000001 00000000 00000000 

If we assume that the Program Counter was HEX 0000 
0156 which is binary: 

00000000 00000000 00000001 01010110 « OLD PROGRAM 

COUNTER . 

Byte 1 indicates a BRANCH instruction op code (000001) 
and *01" indicates select the next page. Byte 2,3, and 4 
are the address operand. These 24-bits will be shifted to 
the left two places to define a WORD address. HEX 0156 
shifted left two places is HEX 0558. Since this is a 
24-bit operand instruction, the most significant 6 bits of 
the Program Counter define the current page. These six 
bits will be incremented to select the next page. 
Executing this instruction will cause the Program Counter 
to be loaded with HEX 0400 0558 which is binary: 




25 00000100 00000000 00000101 01011000 m HEW PROGRAM 

COUNTER. 
INSTRUCTIONS 
CALL-LONG 

0000 00XX - YYYYYYYY - YYYYYYYY - YYYYYYYY 
30 Load the Program Counter with the effective WORD 

address specified. Push the current PC contents onto the 
RETURN STACK. 

OTHER EFFECTS: CARRY or modes, no effect. May cause 
Return Stack to force an external memory cycle if on-chip 
35 Return Stack is full. 
BRANCH 
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0000 01XX - YYYYYVYY - YYYYYYYY - YYYYYYYY 
Load the Program Counter with the effective WORD 
address specified. 

OTHER EFFECTS: NONE 
5 BRANCH- I F-ZERO 

0000 X0XX - YYYYYYYY - YYYYYYYY - YYYYYYYY 
Test the TOP value on the Parameter Stack. If the 
value is equal to zero, load the Program Counter with the 
effective WORD address specified. If the TOP value is not 
10 equal to zero, increment the Program Counter and fetch and 
execute the next instruction. 

OTHER EFFECTS: NONE 
L00P-IF-N0T-D0NE 

0000 11YY - (XXXX XXXX) - (XXXX XXXX) - (XXXX XXXX) 
15 If the LOOP COUNTER is not zero, load the Program 

Counter with the effective WORD address specified* If the 
LOOP COUNTER is zero, decrement the LOOP COUNTER, 
increment the Program Counter and fetch and execute the 
next instruction. 
20 OTHER EFFECTS: NONE 

8 -BIT INSTRUCTIONS PHILOSOPHY 

Most of the work in the microprocessor 50 is done by 
the 8-bit instructions. Eight bit instructions are 
possible with the microprocessor because of the extensive 
25 use of implied stack addressing. Many 32-bit 
architectures use 8 -bits to specify the operation to 
perform but use an additional 24-bits to specify two 
sources and a destination. 

For math and logic operations, the microprocessor 50 
3 0 exploits the inherent advantage of a stack by designating 
the source operand (s) as the top stack item and the next 
stack item. The math or logic operation is performed, the 
operands are popped from the stack, and the result is 
pushed back on the stack. The result is a very efficient 
35 utilization of instruction bits as veil as registers. A 
comparable situation exists between Hewlett Packard 
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calculator^ (which use a stack) and Texas Instrument 
calculators which don't. The identical operation on an HP 
will require one half to one third the keystrokes of the 

ti. . .... t . m \ : : / }\\ \ t 

The availability of 8-bit instructions also allows 
another architectural innovation, the fetching of four 
instructions in a single 32-bit memory cycle. The 
advantages of fetching multiple instructions are: 

Increased execution speed even with slow memories , 
Similar performance to the Harvard (separate data and 
instruction busses) without the expense. 
Opportunities to optimize groups of instructions. 
The capability to perform loops within this 
mini-cache. 

The microloops inside the four instruction group are 
effective for searches and block moves. 
SKIP INSTRUCTIONS 

The microprocessor 50 fetches instructions in 32-bit 
chunks called 4-byte instruction groups. These four bytes 
may contain four 8-bit instructions or some mix of 8-bit 
and 16 or 24-bit instructions. SKIP instructions in the 
microprocessor skip any remaining instructions in a 
4-byte instruction group and cause a memory fetch to get 
the next 4-byte instruction group. Conditional SKIPs when 
combined with 3-byte BRANCHES will create conditional 
BRANCHES . SKIPs may also be used in situations when no 
use can be made of the remaining bytes in a 4 -instruction 
group. A SKIP executes in a single cycle, whereas a group 
of three NOPs would take three cycles. 



SKIP-ALWAYS 



SKIP-IF-ZERO 



Skip any remaining instructions in 
this 4-byte instruction group. 
Increment the most significant 
30-bits of the Program Counter and 
proceed to fetch the next 4-byte 
instruction group. 

If the TOP item of the Parameter Stack 
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SKIP-NEVER 
(NOP) 

SKIP-IF-NOT-ZERO 



35 



47 - 

is zero, skip any remaining 
instructions in the 4 -byte instruction 
group. Increment the most significant 
30-bits of the Program Counter and 
proceed tc fetch the next 4 -byte 
instruction group. If the TOP item is 
not zero, execute the next sequential 
instruction* 

If the TOP item of the Parameter Stack 
has a the most significant bit (the 
sign bit) equal to "o» f skip any 
remaining instructions in the 4-byte ^ 
instruction group. Increment the most 
significant 30-bits of the Program 
Counter and proceed to fetch the next 
4 -byte instruction group. If the TOP 
item is not »0 n , execute the next 
sequential instruction. 
If the CARRY flag from a SHIFT or 
arithmetic operation is not equal to 
H 1 M , skip any remaining instructions 
in the 4-byte instruction group. 
Increment the most significant 30- 
bits of the Program Counter and 
proceed to fetch the next 4-byte 
instruction group. If the CARRY is 
equal to H l M , execute the next 
sequential instruction. 
Execute the next sequential 
instruction. (Delay one machine 
cycle) . 

If the TOP item on the Parameter Stack 
is not equal to "0 W , skip any 
remaining instructions in the 4-byte 
instruction group. Increment the most 
significant 30-bits of the Program 
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Counter and proceed to fetch the next 
4-byte instruction group. 
If the TOP item is equal 0", execute 
the next sequential instruction. 
If the TOP item on the Parameter Stack 
has its most significant bit (sign 
bit) set to "1", skip any remaining 
instructions in the 4-byte instruction 
group. Increment the most significant 
10 30-bits of the Program Counter and 

proceed to fetch the next 4-byte 
, / instruction group* If the TOP item, 

has its most significant bit set to ; 
"0", execute the next sequential 
15 instruction. 

SKIP-IF-CARRY - If the CARRY flag is set to "1" as a 

result of SHIFT or arithmetic 
operation, skip any remaining 
instructions in the 4-byte instruction 
20 group. Increment the most significant 

30-bits of the Program Counter and 
proceed to fetch the next 4-byte 
instruction group. If the CARRY flag 
is *0 H , execute the next sequential 
25 instruction. 
MICROLOOPS 

Microloops are a unique feature of the microprocessor 
architecture which allows controlled looping within a 
4-byte instruction group. A microloop instruction tests 

30 the LOOP COUNTER for M 0 W and may perform an additional 
test. If the LOOP COUNTER is not *0" and the test is met, 
instruction execution continues with the first instruction 
in the 4-byte instruction group, and the LOOP COUNTER is 
decremented. A microloop instruction will usually be the 

35 last byte in a 4-byte instruction group, but it can be any 
byte. If the LOOP COUNTER is "0" or the test is not met, 
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instruction Execution continues with the next instruction* 
If the microloop is the last, byte in the 4-byte 
instruction group, the most significant 30 -hits of the 
Program Counter are incremented and the next 4 -byte 
5 instruction group is fetched from memory. On a 
termination of the loop on LOOP COUNTER equal ^to *0* # the 
LOOP COUNTER will remain at "0". Microloops allow short 
iterative work such as moves and searches to be performed 
without slowing down to fetch instructions from memory. 
10 EXAMPLE: 

Byte 1 Byte 2 

FETCH- VI A-X-AUTOINCREMENT STORE -VIA-Y-AUTOINCREHENT , 

Byte 3 Byte 4 V ; /f L : \ - - V - / 

ULOOP-UNTIL-DONE QQQQQQQQ 

15 This example will perform* a block move. To initiate 

the transfer, X will be loaded with the starting address 
of the source. Y will be loaded with the starting address 
of the destination. The LOOP COUNTER will be loaded with 
the number of 32-bit words to move. The microloop will 

20 FETCH and STORE and count down the LOOP COUNTER until it 

reaches zero. QQQQQQQQ indicates any instruction can 
follow. 

MICROLOOP INSTRUCTIONS 

ULOOP-UNTIL-DONE - If the LOOP COUNTER is not »0", 
25 continue execution with the first 

instruction in the 4-byte instruction 
group. Decrement the LOOP COUNTER. If the 
LOOP COUNTER is "0", continue execution 
with the next instruction. 
30 UL00P-IF-ZERO - If the LOOP COUNTER is not »0* and the TOP 

item on the Parameter Stack is "0", 
continue execution with the first 
instruction in the 4-byte instruction 
group. Decrement the LOOP COUNTER. If the 
35 LOOP COUNTER is H 0 H or the TOP item is "1", 

continue execution with the next 
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instruction. 

ULOOP-IF-POSITIVE - If the LOOP COUNTER is not *Q* end the 
most significant bit (sign bit) is 
continue execution with the first 
5 instruction in the 4-byte instruction 

group. Decrement the LOOP COUNTER. If the 
LOOP COUNTER is "0* or the TOP item is 
continue execution with the next 
instruction. 

10 ULOOP-IF-NOT-CARRY-CLEAR - If the LOOP COUNTER is not »0» 

and the floating point exponents found in 
TOP and NEXT are not \ aligned, ^continue 
execution with the first instruction in the 
4 -byte instruction group. Decrement the 
15 LOOP COUNTER . If the LOOP COUNTER is *0" 

or the exponents are aligned, continue 
execution with the next instruction. 
This instruction is specifically designed 
for combination with special SHIFT 
instructions to align two floating point 
numbers. 

ULOOP-NEVER - ( DECREMENT-LOOP-COUNTER) 

Decrement the LOOP COUNTER. Continue 
execution with the next instruction. 
UI/>0P-IF-N0T-ZER0 - If the LOOP COUNTER is not "0" and the 
TOP item of the Parameter Stack is "O", 
continue execution with the first 
instruction in the 4-byte instruction 
yroup. Decrement the LOOP COUNTER. If the 
LOOP COUNTER is "0" or the TOP item is "1", 
continue execution with the next 
instruction. 

CLOOP-IF-NECATIVE - If the LOOP COUNTER is not *0" and the 
most significant bit (sign bit) of the TOP 
35 item of the Parameter Stack is "1", 

continue execution with the first 
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instruction in the 4 -byte instruction 
group. Decrement the LOOP COUNTER. If the 
LOOP COUNTER is "0» or the most significant 
bit of the Parameter Stack is *0", continue 
execution with the next instruction. 
ULOOP-IF-CARRY-SET - If the LOOP COUNTER is not "0* and 
the exponents of the floating point numbers 
found in TOP and NEXT : . are not aligned, 
continue execution with —the first 
instruction in the 4-byte instruction, 
group* Decrement the LOOP COUNTER. If "the"! 
WOP COUNTER is "0" or the exponents are 
aligned, continue execution with the next 
instruction. 
RETURN FROM SUBROUTINE OR INTERRUPT 

Subroutine calls and interrupt acknowledgements cause 
a redirection of normal program execution. In both cases, 
the current Program Counter is pushed onto the Return 
Stack, so the microprocessor can return to its place in 
the program after executing the subroutine or interrupt 
service routine. 

NOTE: When a CALL to subroutine or interrupt is 
acknowledged the Program Counter has already been 
incremented and is pointing to the 4 -byte instruction 
group following the 4-byte group currently being executed. 
The instruction decoding logic allows the microprocessor 
to perform a test and execute a return conditional on the 
outcome of the test in a single cycle. A RETURN pops an 
address from the Return Stack and stores it to the 
Program Counter. 
RETURN INSTRUCTIONS 

RETURN-ALWAYS - Pop the top item from the Return Stack and 
transfer it to the Program Counter. 

RETURN- I F- Z ERO - If the TOP item on the Parameter Stack is 
"0", pop the top item from the Return Stack 
and transfer it to the Program Counter. 
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Otherwise execute the next instruction. 
. RETURN-IF-POSITIVE - It the most significant bit (sign 
bit) of the TOP item on the Parameter Stack 
is a *0*, pop the top item from the Return 
Stack and transfer it to the Program 
Counter. Otherwise execute the next 
instruction. 

RETURN-I F-CARRY-CLEAR - If the exponents of the floating 
point numbers found in TOP and NEXT are not 
aligned, pop the top item from the Return 
Stack and transfer it to the Program 
Counter. Otherwise execute the next 
instruction. 

RETURN-NEVER - Execute the next instruction. 

(NOP) 

RETURN-I F-NOT- ZERO - if the TOP item on the Parameter 
Stack is not "0% pop the top item from the 
Return Stack and transfer it to the Program 
Counter. Otherwise execute the next 
instruction. 

RETURN-IF-NEGATIVE - If the most significant bit (sign 
bit) of the TOP item on the Parameter Stack 
is a "1", pop the top item from the Return 
Stack and transfer it to the Program 
Counter. Otherwise execute the next 
instruction. 

RETURN- I F-CARRY-SET - If the exponents of the floating 
point numbers found in TOP and NEXT are 
aligned, pop the top item from the Return 
Stack and transfer it to the Program 
Counter. Otherwise execute the next 
instruction. 

HANDLING MEMORY FROM DYNAMIC RAM 

The microprocessor 50, like any RISC type 
architecture, is optimized to handle as many operations as 
possible on-chip for maximum speed. External meaory 
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operations take from 80 nsec* to 220 nsec. coap&red with 
on-chip memory speeds of from 4 nsec* to 30 nsec. There 
are times when external memory must be accessed. 

External memory is accessed using three registers; 

X -REGISTER -v A 30-bit memory pointer which can be 
used for memory access and simultaneously incremented 
or decremented . 

Y-REGISTER - A 30-bit memory pointer which can be 
used for memory access and simultaneously incremented 
or decremented. . - : , _ ^ 

PROGRAM-COUNTER - A 30-bit memory pointer normally 
used to point to 4-byte instruction groups. External 
memory may be accessed at addresses relative to the 
PC. The operands are sometimes called "Immediate" or 
"Literal" in other computers. When used as memory 
pointer, the PC is also incremented after each 
operation. 
MEMORY LOAD & STORE INSTRUCTIONS 

FETCH-VIA-X - Fetch the 32-bit memory content pointed to 
by X and push it onto the Parameter Stack. 
X is unchanged. 

FETCH-VIA-Y - Fetch the 32-bit memory content pointed to 
by X and push it onto the Parameter Stack. 

Y is unchanged. 

FETCH -VI A -X - AUTOINCREMENT - Fetch the 32-bit memory 
content pointed to by X and push it onto 
the Parameter Stack. After fetching, 
increment the most significant 30 bits of X 
to point to the next 32-bit word address. 

FETCH-VIA-Y-AUTOINCREMENT - Fetch the 32-bit memory 
content pointed to by V and push it onto 
the Parameter Stack. After fetching, 
increment the most significant 30 bits of 

Y to point to the next 32-bit word address. 
FETCH- VI A - X - AUTO DECREMENT - Fetch the 32-bit memory 
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content pointed tc by X and push it cnto 
the Parameter Stack. - ~ After fetching, 
decrement the most significant 30 bits ef 
X to point to the previous 12-bit werd 
5 address. 

FETCH-VIA-Y-AUTODECREMENT - Fetch the 32-bit memory 
content pointed to by , Y and push it , onto 
the Parameter Stack. . After fetching, 
decrement the most significant , 30 >its ct 
-1° V ...to point to the previous 32-bit word 

address. ., v _. ; _ ^ , 

Pop the top, item of 4 the Parameter Stack and 
store it in the memory location pointed to 
by X. X is unchanged. 

Pop the top item of the Parameter Stack and 
store it in the memory location pointed to 
by Y. Y is unchanged. 
STORE -VIA-X-AUTOINCREMENT - Pop the top item of the 
Parameter Stack and store it in the memory location 
20 pointed to by X. After storing increment the most 
significant 30 bits of X to point to the next 32-bit word 
address. 

STORE-VIA-Y-AUTOINCREMENT - Pop the top item of the 
Parameter Stack and store it in the memory 
25 location pointed to by Y. After storing, 

increment the most significant 30 bits of 
Y to point to the next 32-bit word 
address. 

STORE-VIA-X-AUTODECREMENT -Pop the top item of the 
30 Parameter Stack and store it in the memory 

location pointed to by X. After storing, 
decrement the most significant 30 bits of 
X to point to the previous 32-bit word 
address. 

35 STORE-VIA-Y -AUTODECREMENT - Pop the top item of the 

Parameter Stack and store it in the memory 
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location pointed to by Y. After storing, 
decrement the most significant 30 bits «f 
Y to point to the previous 32-bit word 
address . 

FETCH-VIA-PC - Fetch the 32-bit memory content pointed to 
by the Program Counter and push it onto the 
Parameter Stack, After fetching, Increment 
the »o$t significant 30 bits of the Program 
Counter to point to the next 32-bit word 
address . 

*K0TE When this instruction executes , the PC is 

pointing to the memory location following 
the instruction. The effect is of loading 
a 32 -bit immediate operand « This is an 
•-bit instruction and therefore will be 
combined with other «-bit instructions in a 
4-byte instruction fetch* It is possible 
to have from one to four FETCH-VIA-PC 
instructions in a 4-byte instruction fetch. 
The PC increments after each execution of 
FETCH-VIA-PC, so it is possible to push 
four immediate operands on the stack* The 
four operands would be the found in the 
four memory locations following the 
instruction. 

Sm-FETCH-VZA-X - Fetch the 32-bit memory content pointed 
to by the most significant 30 bits of X. 
Using the two least significant bits of X, 
select one of four bytes from the 32-bit 
memory fetch, right justify the byte in a 
32-bit field and push the selected byte 
preceded by leading zeros onto the 
Parameter Stack. 

BYTE-STORE-VIA-X - Fetch the 32-bit memory content pointed 
to by the most significant 30 bits of X. 
Pop the TOP item from the Parameter Stack* 
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Using the two least significant bits of X 
place the least significant byte into the 
32-bit memory data and write the 3 2 -bit 
entity back to the location pointed to by 
the most significant 30 bits of X. 
OTHER EFFECTS OF MEMORY ACCESS INSTRUCTIONS: 

Any FETCH instruction will push a value on the 
Parameter Stack 74. If the on-chip stack is full, the 
stack will overflow into off-chip memory stack resulting 
in an additional memory cycle. Any STORE instruction will 
pop a value from the Parameter Stack 74. If the on-chip 
stack is empty, a memory cycle will be generated to fetch 
a value from off-chip memory stack. 
HANDLING ON-CHIP VARIABLES 

High-level languages often allow the creation of 
LOCAL VARIABLES . These variables are used by a particular 
procedure and discarded. In cases of nested procedures, 
layers of these variables must be maintained. On-chip 
storage is up to five times faster than off-chip RAM, so a 
means of keeping local variables on-chip can make 
operations run faster. The microprocessor 50 provides the 
capability for both on-chip storage of local variables and 
nesting of multiple levels of variables through the Return 
Stack. 

The Return Stack 134 is implemented as 16 on-chip RAM 
locations. The most common use for the Return Stack 134 
is storage of return addresses from subroutines and 
interrupt calls. The microprocessor allows these 16 
locations to also be used as addressable registers. The 
16 locations may be read and written by two instructions 
which indicate a Return Stack relative address from 0-15. 
When high-level procedures are nested, the current 
procedure variables push the previous procedure variables 
further down the Return Stack 134. Eventually, the Return 
Stack will automatically overflow into off-chip RAM. 
ON-CHIP VARIABLE INSTRUCTIONS 
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READ-LOCAL-VARIABLE XXXX - Read the XXXXth i^ti«" ,U !^^^^ 
relative to the top of the Return Stack^ : : i ^r ; 1, 
(XXXX is a binary number from 0000-1111). * 
Push the item read onto the Parameter „ /Ti^* 
Stack* ' ;„,/'' t ;~V^^C-% 

OTHER EFFECTS: If the Parameter Stack is . > . 

full, the push operation will cause a 
memory cycle to be generated as one item of - r ^ " ; 
the stack is automatically stored to 
external RAM. The logic which selects the r 
location performs a modulo 16 subtraction. 
If four local variables have been pushed 
onto the Return Stack, and an instruction 
attempts to READ the fifth item, unknown 
data will be returned. 
WRITE-LOCAL-VARIABLE XXXX - Pop the TOP item of the 
Parameter Stack and write it into the 
XXXXth location relative to the top of the 
Return Stack. (XXXX is a binary number 
from 0000-1111.) 

OTHER EFFECTS: If the Parameter Stack is 
empty, the pop operation will cause a 
memory cycle to be generated to fetch the 
Parameter Stack item from external RAM. 
The logic which selects the location 
performs a modulo 16 subtraction. If four 
local variables have been pushed onto the 
Return Stack, and an instruction attempts 
to WRITE to the fifth item, it is possible 
to clobber return addresses or wreak other 
havoc . 

REGISTER AND FLIP-FLOP TRANSFER AND PUSH INSTRUCTIONS 
DROP - Pop the TOP item from the Parameter Stack 

and discard it. 
SWAP - Exchange the data in the TOP Parameter 

Stack location with the data in the NEXT 



A-50412/WEK 



r- ■ - u v 





Parameter ? 
Parameter ~ 



the 



- 58 - 

Parameter Stack locat 

- . DUP - Duplicate the „TOP item on the 

Stack and push it onto the 
_ ; _ - _ stack- - _ : ~ 

5 PUSH-LOOP-COUNTER - Push the value in LOOP COUNTER onto 

the Parameter Stack. 
POP-RSTACK-PUSH-TO-STACK - Pop the top item from "the 
Return Stack and push it onto 
Parameter Stack. ^ ^ ~ 

10 PUSH-X-REG - Push the value in the X Register onto the 

Parameter Stack. 
PUSH-STACK-POINTER - Push the value of the Parameter Stack 

pointer onto the Parameter Stack. 
PUSH-RSTACK-POINTER - Push the value of the Return Stack 
15 pointer onto the Return Stack. 

PUSH-MODE-BITS - Push the value of the MODE REGISTER onto 

the Parameter Stack. 
PUSH-INPUT - Read the 10 dedicated input bits and push 
the value (right justified and padded with 
20 leading zeros) onto the Parameter Stack. 

SET-LOOP -COUNTER - Pop the TOP value from the Parameter 

Stack and store it into LOOP COUNTER. 
POP-STACK-PUSH-TO-RSTACK - Pop the TOP item from the 
Parameter Stack and push it onto the Return 
25 Stack. 

SET-X-REG - Pop the TCP item from the Parameter Stack 

and store it into the X Register. 
SET-STACK-POINTER - Pop the TOP item from the Parameter 
Stack and store it into the Stack Pointer. 
30 SET-RSTACK- PO I NTER - Pop the TOP item from the Parameter 
Stack and store it into the Return Stack Pointer. 
SET-MODE-BITS - Pop the TOP value from the Parameter Stack 

and store it into the MODE BITS. 
SET-OUTPUT - Pop the TOP item from the Parameter Stack 
35 and output it to the 10 dedicated output 

bits. 
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OTHER EFFECTS: Instructions which push or 
pop the Parameter Stack or Return Stack may 
cause a memory cycle as the^ stacks overflow 
back and forth between on-chip and off-chip 
* 5 memory. 

LOADING A SHORT LITERAL - V ~~ - - .- 

r.' . A special ^case "of ^register transfer instruction ^is 
used to push" an ' 8-bit literal onto the Parameter ,Stac£ 
This instruction s requires that the 8 -bits to be "pushed 
10 reside in the last byte of a 4 -byte instruction group. 

The instruction op-code loading the literal may reside in 
, 1 ANY of the other three bytes in the instruction grouper : 
EXAMPLE: 

BYTE 1 BYTE 2 BYTE 3 

15 LOAD-SHORT -LITERAL QQQQQQQQ QQQQQQQQ 

BYTE 4 
00001111 

In this example, QQQQQQQQ indicates any other 8-bit 
instruction. When Byte 1 is executed, binary 00001111 

20 (HEX Of) from Byte 4 will be pushed (right justified and 
padded by leading zeros) onto the Parameter Stack, Then 
the instructions in Byte 2 and Byte 3 will execute. The 
microprocessor instruction decoder knows not to execute 
Byte 4. It is possible to push three identical 8-bit 

25 values as follows: 

BYTE 2 

LOAD-SHORT-LITERAL 
BYTE 4 
00001111 
3 0 SHORT-LITERAL-INSTRUCTION 

LOAD-SHORT-LITERAL - Push the 8-bit value found in Byte 4 
of the current 4 -byte instruction group 
onto the Parameter Stack. 
LOGIC INSTRUCTIONS 
35 Logical and math operations used the stack for the 

source of one or two operands and as the destination for 




BYTE 1 

LOAD-SHORT-LITERAL 
BYTE 3 

LOAD-SHORT-LITERAL 
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results. The stack organization is a particularly 
convenient arrangement for evaluating expressions. TOP 
indicates the top value on the Parameter Stack 74.^ NEXT 
indicates the next to top value on the Parameter Stack 74. 
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; > ^ AND - Pop TOP and NEXT from the Parameter Stack, -*—?$\" 

— perform the logical AND operation on these 

^fisd two operands, and push 'the result onto the ,1 

3^C^^-—;_ . - - - Parameter Stack. * - ^ <r^£^v^T~rf~ — 

?^J?J V>~ ^ \ 10 . OR - Pop TOP and NEXT from the Parameter " Sl^ck^f^^^^ 

^?-A> perform the logical OR operation on ; these L^^' 

SOH ^ tW ° operands ' and P ush tiie result jbnto^the ^f 

" . v Parameter Stack. 

-~ ' XOR - Pop TOP and NEXT from the Parameter Stack, 

15 perform the logical exclusive OR on these 

two operands, and push the result onto the 
.t^-=~tj Parameter Stack. 

^^-v: BIT-CLEAR - Pop TOP and NEXT from the Parameter Stack, 

r~X-***" toggle all bits in NEXT, perform the 

20 logical AND operation on TOP, and push the 

result onto the Parameter Stack. (Another 
„ /t way of understanding this instruction is 

^ V ' thinking of it as clearing all bits in TOP 

• that are set in NEXT.) 

W ^ 25 MATH INSTRUCTIONS 

f ; Math instruction pop the TOP item and NEXT to top 

- " item of the Parameter Stack 74 to use as the operands. 

The results are pushed back on the Parameter Stack. The 
CARRY flag is used to latch the "33rd bit" of the ALU 
30 result. 

ADD - Pop the TOP item and NEXT to top item from 

the Parameter Stack, add the values 
together and push the result back on the 
Parameter Stack. The CARRY flag may be 
35 changed. 

ADD- WITH -CARRY - Pop the TOP item and the NEXT to top item 
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from the Parameter Stack, add the values 
together* If the CARRY flag is , 
increment the result* Push the ultimate 

The 



ADD-X 



SUB - 
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result back on the Parameter Stack. 
CARRY flag may be changed. : - 
- Pop the TOP item from the Parameter Stack 
- and read the third item from the top of, the - 

Parameter Stack. Add the values together ' " 
and push the result back on the Parameter . 
10 Stack. The CARRY flag may be changed. V :/^v ^ v 

Pop the TOP item and NEXT to top item f xo© 
the Parameter Stack t Subtract NEXT f rom^TOP 
and push the result back on the Parameter 
Stack. The CARRY flag may be changed. r ; \\ 
15 SUB-WITH-CARRY - Pop the TOP item and NEXT to top item 

from the Parameter Stack. Subtract NEXT 
from Tor. If the CARRY flag is •!« 
increment the result. Push the ultimate 
result back on the Parameter Stack. The 
20 CARRY flag may be changed. 

SUB-X- - 
SIGNED-MULT-STEP- 
UNSIGNED-MULT-STEP - 
SIGNED-FAST-MULT - 
25 FAST-MULT-STEP - 

UNSIGNED-DIV-STEP - 
GENERATE-POLYNOMIAL - 
ROUND - 

COMPARE - Pop the TOP item and NEXT to top item froa 

30 the Parameter Stack. Subtract NEXT froa 

TOP. If the result has the most 
significant bit equal to "O" (the result is 
positive) , push the result onto the 
Parameter Stack. If the result has the 
35 most significant bit equal to *l w (the 

result is negative) , push the old value of 
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TOP onto the Parameter Stack, 
flag may be affected* 
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Shift the TOP Parameter Stack item left one 
bit. The CARRY flag is shifted into the 
least significant bit of TOP* 
SHIFT-RIGHT - Shift the TOP Parameter Stack item , .right ^J^z ' ^ftv- 
**;--* one bit. The least significant bit of TOP~ 
is shifted into the CARRY flag. Zero is 
shifted into the most significant bit of ,^ 

DOUBLE-SHIFT-LEFT - Treating the TOP item of the Parameter 
Stack as the most significant word of a 
64 -bit number and the NEXT stack item as 
the least significant word, shift the 
combined 64-bit entity left one bit. The 
CARRY flag is shifted into the least 
significant bit of NEXT. 

DOUBLE-SHIFT-RIGHT - Treating the TOP~ item of the 
Parameter Stack as the most significant 
word of -a 64 -bit number and the NEXT stack 
item as the least significant word, shift 
the combined 64-bit entity right one bit. 
The least significant bit of NEXT is 
shifted into the CARRY flag. Zero is 
shifted into the most significant bit of 
TOP. 

OTHER INSTRUCTIONS 

FLUSH-STACK - Empty all on-chip Parameter Stack locations 
into off-chip RAM. (This instruction is 
useful for multitasking applications) . 
This instruction accesses a counter which 
holds the depth of the on-chip stack and 
can require from none to 16 external memory 
cycles. 

FLUSH-RSTACK - Empty all on-chip Return Stack locations 
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and can require from none to 16 external 
.memory cycles. ^ . ^ r ^7^, . ^^X*- 
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,; t .5r It should further be apparent to those skilled in the f T 
art that various changes in form and details of the i 
invention ias ^Tshown and described may -be "-made* It '-is x.^fw^sr 
intended that such ' changes* be included within 'the' '" spirit-;:* -'-V??^^^!^ 
and scope of the claims appended hereto.'";; \: — " ,~ r^^fe^Jl^ 
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WHAT IS CLAIMED IS; ^ ~ r r;c C ^ 

1. A microprocessor system, comprising a central 
processing unit, a dynamic t random access memory, ^ a -bus 
connecting said central processing unit to said dynamicjl^ 
random access memory, and multiplexing means on said bus : 
between said central processing unit and said 'dynamic/^ 
random access memory, said multiplexing means ^being 
connected and configured to provide row addresses, column 
addresses and data on said bus. ; 




2. The microprocessor system of Claim 1 in which 
said multiplexing means includes a plurality of latches 
for providing the row addresses to said dynamic random 
access semory* 

3. A microprocessor system, comprising a central 
processing unit, a memory, a bus connecting said central 
processing unit to said memory, and means connected to 
said bos for fetching instructions for said central 
processing unit on said bus, said means for fetching 
instructions being configured to fetch multiple sequential 
instructions in a single memory cycle. 

4. The microprocessor system of Claim 3 in which 
said central processing unit includes an arithmetic logic 
unit and a first pusl down stack connected to said 
arithmetic logic unit, said first push down stack 
including means for storing a top item connected to a 
first input of said arithmetic logic unit and means for 
storing a next item connected to a second input of said 
arithmetic logic unit, said arithmetic logic unit having 
an output connected to said means for storing a top item. 



5. The microprocessor system of Claim 4 additionally 
comprising a second push down stack, said means for 
storing a tc^ item being connected to provide an input to 
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said second push down stack. 





6. The microprocessor system of Claim 5 in which .." 
said second push down stack comprises a register file and -V- 
said means for storing a top item and said register • f ile'> #t^V^W 
are bidirectionally connected. ! - 

7. The microprocessor system of Claim 3 additionally 



fetching 



comprising means connected to said means for 
10 multiple instructions for determining if , multiple 
instructions fetched by said means for fetching multiple 
instructions require a memory access, said means for 
fetching multiple instructions fetching additional 
multiple instructions if the multiple instructions do not 
15 require a memory access. 




m. 



8. The microprocessor system of Claim 3 in which 
said microprocessor system , including said memory, is 
conta* 2d in an integrated circuit, said memory is a 

20 dynamic random access memory, and said means for fetching 
multiple instructions includes a column latch for 
receiving the multiple instructions. 

9. The microprocessor system of Claim 3 additionally 
25 comprising an instruction register for the multiple 

instructions connected to said means for fetching 
instructions, means connected to said instruction register 
for supplying the multiple instructions in succession from 
said instruction register, a counter connected to control 

30 said means for supplying the multiple instructions to 
supply the multiple instructions in succession, means for 
decoding the multiple instructions connected to receive 
the multiple instructions in succession from the means for 
supplying the multiple instructions, said counter being 

35 connected to said means for decoding to receive 
incrementing and reset control signals from said means 
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for decoding, said means for decoding being configured to 
* supply" the reset control signal to said counter rand " to 
supply a control signal to said stems for fetching 
instructions in response to a SKIP instruction in the 
multiple instructions. 1 r **~'' J ^ 

10. ^ The microprocessor ' system of Claim 
additionally comprising a loop, counter _ connected ~to : 




receive a decrement control signal from said aeans -for^^'M^ 5 



decoding, said means for decoding being configured to,^ s ^ 
supply the reset control signal to said counter and^the^ 



decrement control signal to said loop counter in response-"? 
to a MICR0L00P instruction in the multiple instructions. 



11. The microprocessor system of Claim 3 
additionally comprising an instruction register for the - * 
multiple instructions connected to said means for fetching -,c^;*%T. 
instructions, means connected to said instruction register ;l/v?|f ~ 
for supplying the multiple instructions in succession froia ^ ?jjp* 
said instruction register , a counter connected to control 

said means for supplying the multiple instructions to 

supply the multiple instructions in succession, means for 

decoding the multiple instructions connected to receive v 

the multiple instructions in succession from the means for 

supplying the multiple instructions, said counter being 

connected to said means for decoding to receive £ 

incrementing and reset control signals from said means 6 

for decoding, said means for decoding being configured to ^ 

control said counter in response to an instruction ^ 

utilizing a variable width operand, and means connected * 

to said counter to select the variable width operand in *y 

response to said counter. y 

12. A microprocessor system, comprising a central */ 
processing unit, a dynamic random access memory, a bus \ 
connecting said central processing unit to said dynamic - ^ 
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random access memory, a programmable read only memory 
containing instructions connected to said bus, means 
connected to said bus for fetching instructions for said 
central processing unit on said bus, said means for 
5 fetching instructions including means for assembling a 
plurality of instructions from said programmable read only 
memory and storing the plurality of instructions in said 
dynamic random access memory. 

10 13. A microprocessor system, comprising a central 

processing unit, a direct menory access processing unit, a 
memory, a bus connecting said central processing unit and 
said direct memory access processing unit to said memory, 
said memory containing instructions for said central 

15 processing unit and said direct memory access processing 
unit, said direct memory access processing unit including 
means for fetching instructions for said central 
processing unit on said bus and for fetching instructions 
for said direct memory access processing unit on said bus. 

20 ^ 

14 . A microprocessor system comprising an arithmetic 
logic unit, a first push down stack connected to said 
arithmetic logic unit, said first push down stack 
including means for storing a top item connected to a 

25 first input of said arithmetic logic unit and means for 
storing a next item connected to a second input of said 
arithmetic logic unit, said arithmetic logic unit having 
an output connected to said means for storing a top item, 
a register file, said means for storing a top item being 

30 connected to provide an input to said register file. 



r 
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15. The microprocessor system of Claim 14 in which 
said register file comprises a second push down stack and 
said means for storing a top item and said register file 
are bidirectional ly connected. 
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16. A data processing system, comprising a 
microprocessor including a sensing circuit and a driver " 
circuit, a memory, and an output enable line connected 
between said memory, said sensing circuit and said driver -v 

5 circuit, said sensing circuit being configured to provide 
a ready signal when said output enable line reaches a 
predetermined electrical level, said microprocessor being 
configured so that said driver circuit provides an 
enabling signal on said output enable line responsive to r 
10 the ready signal. ^ - *■ V 

17. The. data processing system of Claim 16 in which - 
the predetermined electrical level is a predetermined 7~ 
voltage. 

15 

18. The data processing system of Claim 17 in which 
said memory is a dynamic random access memory. 

19. A microprocessor system, comprising a central 
20 processing unit and a ring counter variable speed system 

clock connected to said central processing unit, said 
central processing unit and said ring counter variable 
speed system clock being provided in a single integrated 
circuit. 

25 

20. The microprocessor system of Claim 19 
additionally comprising an input/output interface 
connected to exchange coupling control signals, addresses 
and data with said input/ output interface, and a second 

3 0 clock independent of said ring counter variable speed 
system clock connected to said input/ output interface. 



21. The microprocessor system of Claim 20 in which 
said second clock is a fixed frequency clock. 

22. A microprocessor system, comprising a central 
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processing unit, a memory, a bus connecting said central ^^s^s^^J^- 
processing unit to said memory, said central processing ----- r- ^r-F^ 
unit including an arithmetic logic unit and a push down ;v V— 
stack connected to said arithmetic logic unit, said push - L 

down stack including means for storing a top item C 
connected to a first input of said arithmetic logic unit " | 

and means for storing a next .item connected to a second ; ^ -v-'j^ 
input of raid arithmetic logic unit, said arithmetic logic 
unit having an output connected to said means for storing 
a top item, said push down stack having a first plurality "i^l^^^^^ 
of stack elements configured as latches, a second 
plurality of stack elements configured as a random access 
memory, said first and second plurality of stack elements 
and said central processing unit being provided in a r: \r.r-g* 
15 single integrated circuit, and a third plurality of stack f 
elements configured as a random access memory external to ^ 
said single integrated circuit. T 

23. The microprocessor system of Claim 22 

20 additionally comprising a first pointer connected to said a~ 
first plurality of stack elements, a second pointer 
connected to said second plurality of stack elements, and 
a third pointer connected to said third plurality of stack -gf 
elements, said central processing unit being connected to S- 

25 pop items from said first plurality of stack elements, *~ 
said first stack pointer being connected to said second 
stack pointer to pop a first plurality of items from said 
second plurality of stack elements when said first ^ 
plurality of stack elements are empty from successive pop ^ 

30 operations by said central processing unit, said second 

stack pointer being connected to said third stack pointer / 
to pop a second plurality of items from said third r 
plurality of stack elements when said second plurality of ^ 
stack elements are empty from successive pop operations by 

35 said central processing unit. > 
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24* A microprocessor system, comprising a central 
processing unit, said central processing unit including 
arithmetic logic unit r a first register connected to 
supply a first input to said arithmetic logic unit, a 
5 first shifter connected between an output of said "\< 
arithmetic logic unit and said first register, a second ^ , 
register connected to receive a starting polynomial value, t: -^XA^f^^^L 
an output of said second register being connected to a 
second shifter, a least significant bit of said second 

10 register being connected to said arithmetic logic unit, a 

third register connected to supply feedback terms, of 'j .a J^MK^Z 
polynomial to said arithmetic logic unit/ a down count ^r,^^^^- 
for counting down a number corresponding to digits of a 
polynomial to be generated, connected to said arithmetic / V ^ 

15 logic unit, said arithmetic logic unit being responsive to 
a polynomial instruction to carry out an exclusive OR of 
the contents of said first register with the contents of 
said third register if the least significant bit of said 
second register is a "ONE" and to pass the contents of 

20 said first register unaltered if the least significant bit 
of said second register is a "ZERO", until said down 
counter completes a count, the polynomial to be generated 
resulting in said first register. 




25 25. A micioprocessor system, comprising a central 

processing unit, said central processing unit including an 
arithmetic logic unit, a result register connected to 
supply a first input to said arithmetic logic unit, a 
first, left shifting shifter connected between an output 

30 of said arithmetic logic unit and said result register, a 
multiplier register connected to receive a multiplier in 
bit reversed form, an output of said multiplier register 
being connected to a second, right shifting shifter, a 
least significant bit of said second register being 

35 connected to said arithmetic logic unit, a third register 
connected to supply a multiplicand to said arithmetic 
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logic unit, a down counter, for counting down a number ^^^^^^^1 
corresponding to one less than the number of digits of the ^ :^^g4>I 
multiplier, connected to said arithmetic logic unit, said _ „ .^J^IJt^- 
arithmetic logic unit being responsive to a multiply _ 
5 instruction to add the contents of said result register 
with the contents of said third register when the least ~ 
significant bit of said multiplier register "is "'a '"ONE" ind ; 
to pass the contents of said result register unaltered ^ 
when the least significant bit of said multiplier is a -v^.^/^fv 
10 "ZERO", until said down counter completes a count, -, ; the 

product resulting in said first register /^^^^rfSS^^^^S 

26. A microprocessor system, comprising a central 
processing unit, a dynamic random access memory, a bus ;**!/_-*C 

15 connecting said central processing unit to said dynamic y 

random access memory, and multiplexing means on said bus ± 

between said central processing unit and said dynamic ~~ 

random access memory, said multiplexing means being ; 

connected and configured to provide row addresses, column j 

20 addresses and data on said bus, and " ^ 

means connected to said bus for fetching instructions r 

for said central processing unit on said bus, said means ? 
for fetching instructions being configured to fetch 

multiple sequential instructions in a single memory cycle. ^ 

25 c 

27. The microprocessor system of Claim 26 in which 
said central processing unit includes an arithmetic logic 

unit and a first push down stack connected to said ^ 
arithmetic logic unit, said first push down stack _ 
30 including means for storing a top item connected to a 
first input of said arithmetic logic unit and means for 
storing a next item connected to a second input of said 
arithmetic logic unit, said arithmetic logic unit having 
an output connected to said means for storing a top item. 



35 



28. The microprocessor system of Claim 27 
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additionally comprising a second push down stack, said 
&eans for storing a top item being connected to provide an ^ 
input to said second push down stack. 



29. The microprocessor system of Claim 28 in which 
said second push down stack comprises a register file and 
said means for storing a top item and said register file 

are bidirectionally connected. 

30. The microprocessor system of Claim 29 
additionally comprising means connected to said means for i 
fetching multiple instructions for determining if multiple * 
instructions fetched by said means for fetching multiple 
instructions require a memory access, said means for 
fetching multiple instructions fetching additional 
multiple instructions if the multiple instructions do not 
require a memory access. 

31. The microprocessor system of Claim 30 in which 
said microprocessor system, including said memory, is 
contained in an integrated circuit, said memory is a 
dynamic random access memory, and said means for fetching 
multiple instructions includes a column latch for 
receiving the multiple instructions. 

32. The microprocessor system of Claim 30 
additionally comprising an instruction register for the 
multiple instructions connected to said means for fetching 
instructions, means connected to said instruction register 
for supplying the multiple instructions in succession from 
said instruction register, a counter connected to control 
said means for supplying the multiple instructions to 
supply the multiple instructions in succession, means for 
decoding the multiple instructions connected to receive 
the multiple instructions in succession from the means for 
supplying the multiple instructions, said counter being 
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connected to said neans for decoding to receive 
incrementing and reset control signals from said means 
for decoding, said means for decoding being configured to 
supply the reset control signal to said counter and to 
supply a control signal to said means for fetching 
instructions in response to a SKIP instruction/ in the 
multiple instructions. . . fj : _,7~' ; ;v ";-c/^r^?*;;" ^ 

33* The microprocessor system of Claim 32 
additionally comprising a loop counter connected 5 to 
receive a decrement control signal from said means tt or 
decoding, said means for decoding being configured to 
supply the reset control signal to said counter and the 
decrement control signal to said loop counter in response 
to a MICROLOOP instruction in the multiple instructions. 



• 



34. The microprocessor system of Claim 33 in which 
said means for decoding is configured to control said 
counter in response to an instruction utilizing a variable 

20 width operand, said microprocessor system additionally 
comprising means connected to said counter to select the 
variable width operand in response to said counter. 

35. The microprocessor system of Claim 34 

2 5 additionally comprising a programmable read only memory 

containing instructions connected to said bus, means 
connected to said bus for fetching instructions for said 
central processing unit on said bus, said means for 
fetching instructions including means for assembling a 

3 0 plurality of instructions from said programmable read only 

memory and storing the plurality of instructions in said 
dynamic random access memory. 



36. The microprocessor system of Claim 35 
35 additionally comprising a direct memory access processing 
unit, said bus connecting said direct memory access 
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processing unit to said dynamic random access memory, ^^f^^^pi^* 
dynamic random access memory containing instructions for 
said central processing unit and said direct memory access 
processing unit, said direct memory access processing unit 
including means for fetching instructions for said central 
processing unit on said bus and for fetching instructions 
for said direct memory access processing unit on said bus. 



37. 



The microprocessor system of Claim 36 in ^which^^^^y ; 



10 said central processing unit includes an arithmetic logic j^ : 
unit, a first push down stack connected to 
arithmetic logic unit, said first push down ' 
including means for storing a top item connected to a 
first input of said arithmetic logic unit and means for 

15 storing a next item connected to a second input of said 
arithmetic logic unit, said arithmetic logic unit having 
an output connected to said means for storing a top item, 
a register file, said means for storing a top item being 
connected to provide an input to said register file. 



38. The microprocessor system of Claim 37 in which 
said register file comprises a second push down stack and 
said means for storing a top item and said register file 
are bidirectionally connected. 



39. The microprocessor system of Claim 38 in which 
- said microprocessor system includes a sensing circuit and 
a driver circuit, and an output enable line connected 
between said dynamic random access memory, said sensing 
30 circuit and said driver circuit, said sensing circuit 
being configured to provide a ready signal when said 
output enable line reaches a predetermined electrical 
level, said microprocessor system being configured so that 
said driver circuit provides an enabling signal on said 
35 output enable line responsive to the ready signal. 
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40. The nicroprocessor system of Claim 39 in which 
the predetermined electrical level is a predetermined 
voltage, ^ — . 





30 



41. The microprocessor system of Claim 40 
additionally comprising a ring counter variable speed 
system clock connected to said central processing unit, 

said , central processing unit and- said ring , counter - , "''l 

variable speed system clock being provided in a single s ^r; - 
integrated circuit. - - * • - 'V. r/.:^?£&:&- : %~~''s ~ 

42. -The microprocessor system of Claim 41 
additionally comprising an input/output interface 
connected to exchange coupling control signals, addresses 
and data with said input/output interface, and a second 
clock independent of said ring counter variable speed 
system clock connected to said input/output interface. 

43. The microprocessor system of Claim 42 in which 
said second clock is a fixed frequency clock. 

44. The microprocessor system of Claim 43 in which 
said first push down stack has a first plurality of stack 
elements configured as latches, a second plurality of 
stack elements configured as a random access memory, said 
first and second plurality of stack elements and said 
central processing unit being prpvided in a single 
integrated circuit, and a third plurality of stack 
elements configured as a random access memory external to 
said single integrated circuit. 



• 



45. The microprocessor system of Claim 44 
additionally comprising a first pointer connected to said 
first plurality of stack elements, a second pointer 
35 connected to said second plurality of stack elements, and 
a third pointer connected to said third plurality of stack 
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elements, said central processing unit being connected to ;f||^M&fgy 
pop items from said first plurality of stack elements/ — r r *V 
said first stack pointer being connected to said second 
stack pointer to pop a first plurality of items from said :v "~" 
second plurality of stack elements when said first _ _ 
plurality of stack elements are empty from successive pop 
operations by said central processing unit, said second ^ ^ 
stack pointer being connected to said third stack pointed 
to "pop a second' plurality of items from '* said \-t^£rt^S5^^^^>. 
plurality of stack elements when said second plurality "of ^ ^'v^r-^f^S ( 



stack elements are empty from successive pop operations by 
said central processing unit. ~ ^: 



r-c j«hs *™ 



» 



46. The microprocessor system of Claim 45 

15 additionally comprising a first register connected to 
supply a first input to said arithmetic logic unit, a 
first shifter connected between an output of said 
arithmetic logic unit and said first register, a second 
register connected to receive a starting polynomial value, 

20 an output of said second register being connected to a 
second shifter, a least significant bit of said second 
register being connected to said arithmetic logic unit, a 
third register connected to supply feedback terms of a 
polynomial to said arithmetic logic unit, a down counter, 

25 for counting down a number corresponding to digits of a 
polynomial to be generated, connected to said arithmetic 
logic unit, said arithmetic logic unit being responsive to 
a polynomial instruction to carry out an exclusive OR of 
the contents of said first register with the contents of 

30 said third register if the least significant bit of said 
second register is a "ONE" and to pass the contents of 
said first register unaltered if the least significant bit 
of said second register is a "ZERO", until said down 
counter completes a count, the polynomial to be generated 

35 resulting in said first register. 



£5* 
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47. The microprocessor system of Claim 46 in which 
said first ..register is a result register, said first 
shifter is a left shifting shifter, said second register ^^^^^^1 ' 
is a multiplier register connected to receive a y^rx^ _ 
multiplier in bit reversed form, said second shifter is a 
right shifting shifter, said third register is connected^^^- ; ^ 



to supply a multiplicand to said arithmetic logic unity-V.^^^^^ 
said down counter is configured for counting down a number 
corresponding to one less than the number of digits of -the W^^^^^^ 

10 multiplier, said arithmetic logic unit being responsive to 

a multiply instruction to add the contents of said result ^; ^\ 
register with the contents of said third register, if the ;\I}>^5 ; itC 
least significant bit of said second register is a *01fe"^-' : \.S^^& 
and to pass the contents of said first register unaltered / 

15 if the least significant bit of said second register is a 
"ZERO" until said down counter completes a count, the 
product resulting in said first register. " 



48. A microprocessor, which comprises a main central 

20 processing unit and a - separate direct memory access 
central processing unit in a single integrated circuit 
comprising said microprocessor, said main central 
processing unit having an arithmetic logic unit, a first 
push down stack with a top item register and a next item 

25 register, connected to provide inputs to said arithmetic 
logic unit, an output of said arithmetic logic unit being 
connected to said top item register, said top item 
register also being connected to provide inputs to an 
internal data bus, said internal data bus being 

30 bidirectionally connected to a loop counter, said loop 
counter being connected to a decrementer, said internal 
data bus being bidirectionally connected to a stack 
pointer, return stack pointer, mode register and 
instruction register, said internal data bus being 

35 connected to a memory controller, to a Y register of a 
return push down stack, an X register and a program 
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counter, said Y register, X register and program counter 
providing outputs to an internal address bus, - said 
internal address bus providing inputs to said memory); 



controller and to an incremented said incrementer^bei^^^^^*^' 
connected to said internal data bus, said direct memory 
' access central processing unit providing inputs to _*aidj||;J 
memory controller, ; said ^memory " controller^ having 3an^^^^ 
address/data bus and a plurality of control -lines - for ^ 



"connection to a random access memory. 



49. 



The microprocessor of Claim 48 in which said 



memory controller includes a multiplexing "means between^^ 
said central processing unit and said address/data . bus, 
said multiplexing means being connected and configured to 
provide row addresses, column addresses and data on said 
address/data bus. 



50, The microprocessor of Claim 48 in which said 
memory controller includes means for fetching instructions 
20 for said central processing unit on said address/data bus, 
said means for fetching instructions being configured to 
fetch multiple sequential instructions in a single memory 
cycle. 



WLZ 



25 51. The microprocessor of Claim 50 additionally 

comprising means connected to said means for fetching 
instructions for determining if multiple instructions 
fetched by said means for fetching instructions require a 
memory access, said means for fetching instructions 

30 fetching additional multiple instructions if the multiple 
instructions do not require a memory access. 



35 



52. The microprocessor of Claim 50 in which said 
microprocessor and a dynamic random access memory are 
contained in a single integrated circuitand said means *or 
fetching instructions includes a column latch for 
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? y^l\ receiving the multiple instructions. 

' v 53. The microprocessor of Claim 48 in which said 

• microprocessor includes a sensing circuit and a driver 
^ 5 circuit, and an output enable line for connection between 

the random access memory, said sensing circuit and said 
driver circuit, said sensing circuit being configured to 
gr*| provide a ready signal when said output enable line 

reaches a predetermined electrical level, said 
jo microprocessor being configured so that said driver 
circuit provides an enabling signal on said output enable 
^r| |j line responsive to the ready signal. 

54. The microprocessor of Claim 48 additionally 
15 comprising a ring counter variable speed system clock 

connected to said main central processing unit, said main 
— n~ central processing unit and said ring counter variable 

speed system clock being provided in a single integrated 
circuit. 

I . 20 

55. The microprocessor of Claim 54 in which said 
memory controller includes an input/output interface 
connected to exchange coupling control signals, addresses 

1 and data with said main central processing unit, said 

SB.._ 25 microprocessor additionally including a second clock 

independent of said ring counter variable speed system 
clock connected to said input/output interface. 

• 56. The microprocessor of Claim 48 in which said 
30 first push down stack has a first plurality of stack 

elements configured as latches, a second plurality of 
stack elements configured as a random access memory, said 

• first and second plurality of stack elements and said 

central processing unit being provided in a single 

35 integrated circuit, and a third plurality of stack 
elements configured as a random access memory external to 
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, 57. The microprocessor of Claim 56 additionally 
comprising a first pointer connected to said first 
plurality of stack elements, a second pointer connected to 
said second plurality of stack elements, and a third 
pointer connected to said third plurality of stack 
elements, said central processing unit being connected to 
pop items from said first plurality of stack elements, _ ^ 
said first stack pointer being connected to said second 
stack pointer to pop a first plurality of items from .said 
second plurality of stack elements when said first 
plurality of stack elements are empty from successive pop 
operations by said central processing unit, said second 
stack pointer being connected to said third stack pointer 
to pop a second plurality of items from said third 
plurality of stack elements when said second plurality of 
stack elements are tnpty from successive pop operations by 
said centre! processing unit. 



58. In a microprocessor system, a method for 
fetching instructions, each having a first plurality of 
bits, from a memory, which comprises providing an 
instruction register having a second plurality of bits 
constituting a multiple of the first plurality of bits, 
fetching a first set of multiple sequential instructions 
in a single memory cycle, storing the multiple sequential 
instructions in the instruction register, determining if 
the multiple instructions require a memory access, and 
fetching a second set of multiple instructions during 
execution of the first set of multiple instructions if the 
first set of multiple instructions do not require access 
to the memory. 



"6 



59. The method of Claim 58 in which a portion of the 
multiple sequential instructions are skipped in response 
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to a SKIP instruction* 



' : ^'60.^ The' method of Claim 58 in which a portion of the 
multiple sequential instructions are repeated a 
predetermined number of times in response to a MICR0LO0P 
instruction. 

61. The method of Claim 58 additionally comprising 
the steps of storing an instruction utilizing a variable 
width operand and the variable width operand in said 
instruction register, . determining if the instruction 
utilizes a variable width operand, and selecting the width 
of the operand for output from said instruction register 
in response to the instruction using the variable width 
operand. 

62. The method of Claim 58 additionally comprising 
the steps of storing a rlurality of instructions in a read 
only memory, fetching selected instructions from the 
plurality of, instructions, assembling the multiple 
sequential instructions, and storing the multiple 
sequential instructions in a random access memory prior to 
fetching the multiple sequential instructions. 

63. In a microprocessor connected to a memory by an 
output enable line, a method for determining when an 
enable signal can be sent to said memory, which comprises 
sensing a predetermined electrical level on said output 
enable line, and providing the enabling signal on said 
output line in response to the predetermined electrical 
level. 

64. The method of Claim 63 in which the 
predetermined electrical level is a voltage, 

65. In a microprocessor integrated circuit, a method 
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for cloclcing the microprocessor^ which coSprises : |L 

fabricating a ring counter system clock and the . . 
microprocessor each having a plurality of transistors > 
having operating characteristics which vary in the same 
5 way with variations in their fabrication, and using the V 
ring counter system clock for clocking the microprocessor. % 

66. The method of Claim 65 additionally comprising 

the steps of providing an input/output interface for the ^ 
10 microprocessor integrated circuit and clocking the 

incut/output interface with a second clock independent of : - ^ 
the ring counter system clock. , *, " r T~-^ /^"^^Pdfe* 

' ■" * 

67. The method of Claim 66 in which the second clock ^ 
15 is a fixed frequency clock. - *<" 



68. In a microprocessor system, a method for rr 

operating a push down stack, which comprises providing a ff 

first plurality of stack elements configured as latches, a J> 

20 second plurality of stack elements configured as a random j& 

access memory, the first and second plurality of stack *r 

elements being provided in a single integrated circuit _ 

with the microprocessor, providing a third plurality of jj£ 
stack elements configured as a random access memory - *v 

25 external to the single integrated circuit, storing items ^ 
in the push down stack, popping up to a first plurality 

of items from the first plurality of stack elements ;> 
without accessing the second plurality of stack elements, 

popping a first plurality of items from the second t 

30 plurality of stacJc elements when the first plurality of 

stack elements are* empty, popping up to the second \^ 

plurality of items from the second plurality of stack 

elements without accessing the third plurality of stack 

elements, and popping a second plurality of items from the \5 

35 third plurality of stack elements when the second 
plurality of stack elements are empty. 
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69. A me€Rod for generating a polynomial, which 
comprises providing a starting polynomial value, right 7 
shifting feedback terms for the polynomial , determining if ; c ; 

5 a least significant bit of the starting polynomial value r 
is a "ONE* or a "ZERO*, performing an exclusive OR of the v 
shifted feedback terms for the polynomial with the 
feedback terms for the polynomial if the least significant ^ 
bit of the starting polynomial is a "ONE", right shifting ^ 
10 the shifted feedback terms for the polynomial if the least £ 
significant bit of the the starting polynomial is a ^ j? : 
"ZERO", and repeating the above operations a total number^^^^|| 
of times equal to the number of digits of the polynomials 
to be generated. 

15 

70. A method of multiplying, which comprises 
providing a multiplier, a multiplicand and a "ZERO", 
determining if a least significant bit of the multiplier 
is a "ONE" or a "ZERO", adding the multiplicand and the 

20 "ZERO" and shifting the sum left if the least significant 
bit of the multiplicand is a "ONE", storing the "ZERO" if 
the least significant bit of the the starting polynomial 
is a "ZERO" , to give a partial result, shifting the 
multiplier right to give a right shifted multiplier, and 

25 repeating the above operations, using the right shifted 
multiplier in place of the multiplier and the partial 
result in place of the given "ZERO" after the first time 
the operations are performed, and shifting the sum of the 
partial result and the multiplicand or the passed through 

3 0 partial result left to carry out the operations a total 
number of times equal to one less than the number of 
digits in the multiplier, to give a desired product. 
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A microprocessor (50) includes a Bain central r 
t V : V 5 processing unit (CPU) (70) and a separate direct memory 

access (DMA) CPU (72) in a single integrated circuit 

|- ;\ making up the microprocessor (50). The main CPU (70) has \ ^ r 
''Ka a first 16 deep push down stack (74), which has a top item : : . 

register (76) and a next item register (78) , respectively c £ 
10 connected to provide inputs to an arithmetic logic unit' - ' 'f 
: ... , ,* * (ALU) (80) by lines (82) and (84). An output of the ALU . 

(80) is connected to the top item register (76) by line ^ 
(86) . The output of the top item register at (82) is also - - £ 

connected by line (88) to an internal data bus (90)- A ~ £ 

15 loop counter (92) is connected to a decrementer (94) by * 

lines (96) and (98). The loop counter (92) is ~ 

bidirectionally connected to the internal data bus (90) by £ 

line (100), Stack pointer (102), return stack pointer £ 

(104), mode register (106) and instruction register (108) J 

20 are also connected to the internal data bus (90) by lines ~ 

pr*rc^ (110), (112), (114) and (116), respectively. The internal *~ 

data bus (90) is connected to memory controller (118) and ^ 

to gate (120). The gate (120) provides inputs on lines ^ 

(122), (124), and (126) to X register (128), program £ 

25 counter (130) and Y register (132) of return push down |r 

fV. stack (134). The X register (128), program counter (130) r : 
and Y register (132) provide outputs to internal address 

bus (136) on lines (138), (140) and (142). The internal ^ 

address bus provides inputs to the memory controller (118) r 

30 and to an incrementer (144). The incrementer (144) £ 
provides inputs to the X register, program counter and Y 

?';"V y register via lines (146), (122), (124) and (126). The DMA ^ 

CPU (72) provides inputs to the memory controller (118) on 

Ff^S line (148). The memory controller (118) is connected to a < 

35 ram by address/data bus (150) and control lines (152). H 
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